本文共 23587 字,大约阅读时间需要 78 分钟。
title: ‘DeepLearning.ai作业:(1-4)-- 深层神经网络(Deep neural networks)’
tags:首发于个人博客:,欢迎来访
本周的作业分了两个部分,第一部分先构建神经网络的基本函数,第二部分才是构建出模型并预测。
构建的函数有:
初始化使用:
w : np.random.randn(shape)*0.01
b : np.zeros(shape)
1. two-layer
先写了个两层的初始化函数,上周已经写过了。
def initialize_parameters(n_x, n_h, n_y): """ Argument: n_x -- size of the input layer n_h -- size of the hidden layer n_y -- size of the output layer Returns: parameters -- python dictionary containing your parameters: W1 -- weight matrix of shape (n_h, n_x) b1 -- bias vector of shape (n_h, 1) W2 -- weight matrix of shape (n_y, n_h) b2 -- bias vector of shape (n_y, 1) """ np.random.seed(1) ### START CODE HERE ### (≈ 4 lines of code) W1 = np.random.randn(n_h, n_x) * 0.01 b1 = np.zeros((n_h,1)) W2 = np.random.randn(n_y, n_h) * 0.01 b2 = np.zeros((n_y,1)) ### END CODE HERE ### assert(W1.shape == (n_h, n_x)) assert(b1.shape == (n_h, 1)) assert(W2.shape == (n_y, n_h)) assert(b2.shape == (n_y, 1)) parameters = { "W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters
2. L-layer
然后写了个L层的初始化函数,其中,输入的参数是一个列表,如[12,4,3,1],表示一共4层:
def initialize_parameters_deep(layer_dims): """ Arguments: layer_dims -- python array (list) containing the dimensions of each layer in our network Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1]) bl -- bias vector of shape (layer_dims[l], 1) """ np.random.seed(3) parameters = { } L = len(layer_dims) # number of layers in the network for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01 parameters['b' + str(l)] = np.zeros((layer_dims[l], 1)) ### END CODE HERE ### assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1])) assert(parameters['b' + str(l)].shape == (layer_dims[l], 1)) return parameters
1. Linear Forward
利用公式:
Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]} Z[l]=W[l]A[l−1]+b[l]
where A [ 0 ] = X A^{[0]} = X A[0]=X.
这个时候,输入的参数是 A,W,b,输出是计算得到的Z,以及cache=(A, W, b)保存起来
def linear_forward(A, W, b): """ Implement the linear part of a layer's forward propagation. Arguments: A -- activations from previous layer (or input data): (size of previous layer, number of examples) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) Returns: Z -- the input of the activation function, also called pre-activation parameter cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently """ ### START CODE HERE ### (≈ 1 line of code) Z = np.dot(W, A) + b ### END CODE HERE ### assert(Z.shape == (W.shape[0], A.shape[1])) cache = (A, W, b) return Z, cache
2. Linear-Activation Forward
在这里就是把刚才得到的Z,通过 A = g ( Z ) A = g(Z) A=g(Z)激活函数,合并成一个
这个时候,notebook已经给了我们现成的sigmoid和relu函数了,只要调用就行,不过在里面好像没有说明源代码,输出都是A和cache=Z,这里贴出来:
def sigmoid(Z): """ Implements the sigmoid activation in numpy Arguments: Z -- numpy array of any shape Returns: A -- output of sigmoid(z), same shape as Z cache -- returns Z as well, useful during backpropagation """ A = 1/(1+np.exp(-Z)) cache = Z return A, cache
def relu(Z): """ Implement the RELU function. Arguments: Z -- Output of the linear layer, of any shape Returns: A -- Post-activation parameter, of the same shape as Z cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently """ A = np.maximum(0,Z) assert(A.shape == Z.shape) cache = Z return A, cache
而后利用之前的linear_forward,可以写出某层神经元的前向函数了,输入是 A [ l − 1 ] , W , b A^{[l-1]},W,b A[l−1],W,b,还有一个是说明sigmoid还是relu的字符串activation。
输出是 A [ l ] A^{[l]} A[l]和cache,这里的cache已经包含的4个参数了,分别是 A [ l − 1 ] , W [ l ] , b [ l ] , Z [ l ] A^{[l-1]},W^{[l]},b^{[l]},Z^{[l]} A[l−1],W[l],b[l],Z[l]
# GRADED FUNCTION: linear_activation_forwarddef linear_activation_forward(A_prev, W, b, activation): """ Implement the forward propagation for the LINEAR->ACTIVATION layer Arguments: A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" Returns: A -- the output of the activation function, also called the post-activation value cache -- a python dictionary containing "linear_cache" and "activation_cache"; stored for computing the backward pass efficiently """ if activation == "sigmoid": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = sigmoid(Z) ### END CODE HERE ### elif activation == "relu": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = relu(Z) ### END CODE HERE ### assert (A.shape == (W.shape[0], A_prev.shape[1])) cache = (linear_cache, activation_cache) # print(cache) return A, cache
3. L-Layer Model
这一步就把多层的神经网络从头到尾串起来了。前面有L-1层的Relu,第L层是sigmoid。
输入是X,也就是 A [ 0 ] A^{[0]} A[0],和 parameters包含了各个层的W,b
输出是最后一层的 A [ L ] A^{[L]} A[L],也就是预测结果 Y h a t Y_hat Yhat,以及每一层的caches : A [ l − 1 ] , W [ l ] , b [ l ] , Z [ l ] A^{[l-1]},W^{[l]},b^{[l]},Z^{[l]} A[l−1],W[l],b[l],Z[l]
def L_model_forward(X, parameters): """ Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation Arguments: X -- data, numpy array of shape (input size, number of examples) parameters -- output of initialize_parameters_deep() Returns: AL -- last post-activation value caches -- list of caches containing: every cache of linear_activation_forward() (there are L-1 of them, indexed from 0 to L-1) """ caches = [] A = X L = len(parameters) // 2 # number of layers in the neural network # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): A_prev = A ### START CODE HERE ### (≈ 2 lines of code) A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], 'relu') caches.append(cache) ### END CODE HERE ### # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. ### START CODE HERE ### (≈ 2 lines of code) AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)],'sigmoid') caches.append(cache) ### END CODE HERE ### # print(AL.shape) assert(AL.shape == (1,X.shape[1])) return AL, caches
− 1 m ∑ i = 1 m ( y ( i ) log ( a [ L ] ( i ) ) + ( 1 − y ( i ) ) log ( 1 − a [ L ] ( i ) ) ) -\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right)) −m1i=1∑m(y(i)log(a[L](i))+(1−y(i))log(1−a[L](i)))
利用np.multiply
and np.sum
求得交叉熵
def compute_cost(AL, Y): """ Implement the cost function defined by equation (7). Arguments: AL -- probability vector corresponding to your label predictions, shape (1, number of examples) Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples) Returns: cost -- cross-entropy cost """ m = Y.shape[1] # Compute loss from aL and y. ### START CODE HERE ### (≈ 1 lines of code) cost = - np.sum(np.multiply(Y,np.log(AL)) + np.multiply(1-Y,np.log(1-AL))) / m print(cost) ### END CODE HERE ### cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17). assert(cost.shape == ()) return cost
1. Linear backward
首先假设知道 d Z [ l ] = ∂ L ∂ Z [ l ] dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} dZ[l]=∂Z[l]∂L,然后想要求得的是 ( d W [ l ] , d b [ l ] d A [ l − 1 ] ) (dW^{[l]}, db^{[l]} dA^{[l-1]}) (dW[l],db[l]dA[l−1]).
公式已经给你了:
d W [ l ] = ∂ L ∂ W [ l ] = 1 m d Z [ l ] A [ l − 1 ] T dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} dW[l]=∂W[l]∂L=m1dZ[l]A[l−1]Td b [ l ] = ∂ L ∂ b [ l ] = 1 m ∑ i = 1 m d Z [ l ] ( i ) db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l] (i)} db[l]=∂b[l]∂L=m1i=1∑mdZ[l](i)
d A [ l − 1 ] = ∂ L ∂ A [ l − 1 ] = W [ l ] T d Z [ l ] dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} dA[l−1]=∂A[l−1]∂L=W[l]TdZ[l]
cache是linear cache: A_prev,W,b
def linear_backward(dZ, cache): """ Implement the linear portion of backward propagation for a single layer (layer l) Arguments: dZ -- Gradient of the cost with respect to the linear output (of current layer l) cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """ A_prev, W, b = cache m = A_prev.shape[1] ### START CODE HERE ### (≈ 3 lines of code) dW = 1 / m * np.dot(dZ, A_prev.T) db = 1 / m * np.sum(dZ, axis=1,keepdims=True) #print(db.shape) #print(b.shape) dA_prev = np.dot(W.T, dZ) ### END CODE HERE ### assert (dA_prev.shape == A_prev.shape) assert (dW.shape == W.shape) assert (db.shape == b.shape) return dA_prev, dW, db
2. Linear-Activation backward
dA通过激活函数的导数可以求得dZ,再由上面的函数,最终:
输入 d A [ l ] , c a c h e dA^{[l]} , cache dA[l],cache
输出 d A [ l − 1 ] , d W , d b dA^{[l-1]} ,dW,db dA[l−1],dW,db
这个时候它有给了两个现成的函数dZ = sigmoid_backward(dA, activation_cache)
、dZ = relu_backward(dA, activation_cache)
源代码如下,输入的都是dA,和 cache=Z,输出是dZ:
d Z [ l ] = d A [ l ] ∗ g ′ ( Z [ l ] ) dZ^{[l]} = dA^{[l]} * g'(Z^{[l]}) dZ[l]=dA[l]∗g′(Z[l])
def sigmoid_backward(dA, cache): """ Implement the backward propagation for a single SIGMOID unit. Arguments: dA -- post-activation gradient, of any shape cache -- 'Z' where we store for computing backward propagation efficiently Returns: dZ -- Gradient of the cost with respect to Z """ Z = cache s = 1/(1+np.exp(-Z)) dZ = dA * s * (1-s) assert (dZ.shape == Z.shape) return dZ
def relu_backward(dA, cache): """ Implement the backward propagation for a single RELU unit. Arguments: dA -- post-activation gradient, of any shape cache -- 'Z' where we store for computing backward propagation efficiently Returns: dZ -- Gradient of the cost with respect to Z """ Z = cache dZ = np.array(dA, copy=True) # just converting dz to a correct object. # When z <= 0, you should set dz to 0 as well. dZ[Z <= 0] = 0 assert (dZ.shape == Z.shape) return dZ
然后得到了函数如下,注意这里面的cache已经是4个元素了linear_cache=A_prev,W,b
、activation_cache=Z
:
# GRADED FUNCTION: linear_activation_backwarddef linear_activation_backward(dA, cache, activation): """ Implement the backward propagation for the LINEAR->ACTIVATION layer. Arguments: dA -- post-activation gradient for current layer l cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """ linear_cache, activation_cache = cache if activation == "relu": ### START CODE HERE ### (≈ 2 lines of code) dZ = relu_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### elif activation == "sigmoid": ### START CODE HERE ### (≈ 2 lines of code) dZ = sigmoid_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### return dA_prev, dW, db
3. L-Model Backward
可以把前面的函数穿起来,从后面往前面传播了,先算最后一层的sigmoid,然后往前算L-1的循环relu。其中,dAL是损失函数的导数,这个是预先求得知道的,也就是
− y a − 1 − y 1 − a -\frac{y}{a}-\frac{1-y}{1-a} −ay−1−a1−y
numpy表示为:
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
整个backward中,我们的输入只有AL,Y和caches,
输出则是每一层的grads,包括了 d A , d W , d b dA,dW,db dA,dW,db
# GRADED FUNCTION: L_model_backwarddef L_model_backward(AL, Y, caches): """ Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group Arguments: AL -- probability vector, output of the forward propagation (L_model_forward()) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) caches -- list of caches containing: every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2) the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1]) Returns: grads -- A dictionary with the gradients grads["dA" + str(l)] = ... grads["dW" + str(l)] = ... grads["db" + str(l)] = ... """ grads = { } L = len(caches) # the number of layers m = AL.shape[1] Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL # Initializing the backpropagation ### START CODE HERE ### (1 line of code) dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) ### END CODE HERE ### # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"] ### START CODE HERE ### (approx. 2 lines) current_cache = caches[L-1] grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, 'sigmoid') ### END CODE HERE ### # Loop from l=L-2 to l=0 for l in reversed(range(L-1)): # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 1)], current_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines) current_cache = caches[l] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads['dA'+str(l+1)], current_cache, 'relu') grads["dA" + str(l)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp ### END CODE HERE ### return grads
# GRADED FUNCTION: update_parametersdef update_parameters(parameters, grads, learning_rate): """ Update parameters using gradient descent Arguments: parameters -- python dictionary containing your parameters grads -- python dictionary containing your gradients, output of L_model_backward Returns: parameters -- python dictionary containing your updated parameters parameters["W" + str(l)] = ... parameters["b" + str(l)] = ... """ L = len(parameters) // 2 # number of layers in the neural network # Update rule for each parameter. Use a for loop. ### START CODE HERE ### (≈ 3 lines of code) for l in range(L): parameters["W" + str(l+1)] -= learning_rate * grads['dW'+str(l+1)] parameters["b" + str(l+1)] -= learning_rate * grads['db'+str(l+1)] ### END CODE HERE ### return parameters
有了part1中的函数,就很容易在part2中搭建模型和训练了。
依旧是识别猫猫的图片。
开始先用两层的layer做训练,得到了精确度是72%,这里贴代码就好了,L层再详细说说
### CONSTANTS DEFINING THE MODEL ####n_x = 12288 # num_px * num_px * 3n_h = 7n_y = 1layers_dims = (n_x, n_h, n_y)# GRADED FUNCTION: two_layer_modeldef two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False): """ Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID. Arguments: X -- input data, of shape (n_x, number of examples) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- dimensions of the layers (n_x, n_h, n_y) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- If set to True, this will print the cost every 100 iterations Returns: parameters -- a dictionary containing W1, W2, b1, and b2 """ np.random.seed(1) grads = { } costs = [] # to keep track of the cost m = X.shape[1] # number of examples (n_x, n_h, n_y) = layers_dims # Initialize parameters dictionary, by calling one of the functions you'd previously implemented ### START CODE HERE ### (≈ 1 line of code) parameters = initialize_parameters(n_x, n_h, n_y) ### END CODE HERE ### # Get W1, b1, W2 and b2 from the dictionary parameters. W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1, W2, b2". Output: "A1, cache1, A2, cache2". ### START CODE HERE ### (≈ 2 lines of code) A1, cache1 = linear_activation_forward(X, W1, b1, 'relu') A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid') ### END CODE HERE ### # Compute cost ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(A2, Y) ### END CODE HERE ### # Initializing backward propagation dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2)) # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1". ### START CODE HERE ### (≈ 2 lines of code) dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid') dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu') ### END CODE HERE ### # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2 grads['dW1'] = dW1 grads['db1'] = db1 grads['dW2'] = dW2 grads['db2'] = db2 # Update parameters. ### START CODE HERE ### (approx. 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Retrieve W1, b1, W2, b2 from parameters W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Print the cost every 100 training example if print_cost and i % 100 == 0: print("Cost after iteration {}: {}".format(i, np.squeeze(cost))) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters
使用之前的函数:
def initialize_parameters_deep(layers_dims): ... return parameters def L_model_forward(X, parameters): ... return AL, cachesdef compute_cost(AL, Y): ... return costdef L_model_backward(AL, Y, caches): ... return gradsdef update_parameters(parameters, grads, learning_rate): ... return parameters
这里一共4层:
layers_dims = [12288, 20, 7, 5, 1] # 4-layer model
思路是:
# GRADED FUNCTION: L_layer_modeldef L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009 """ Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. Arguments: X -- data, numpy array of shape (number of examples, num_px * num_px * 3) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). learning_rate -- learning rate of the gradient descent update rule num_iterations -- number of iterations of the optimization loop print_cost -- if True, it prints the cost every 100 steps Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(1) costs = [] # keep track of cost # Parameters initialization. (≈ 1 line of code) ### START CODE HERE ### parameters = initialize_parameters_deep(layers_dims) ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID. ### START CODE HERE ### (≈ 1 line of code) AL, caches = L_model_forward(X, parameters) ### END CODE HERE ### # Compute cost. ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(AL, Y) ### END CODE HERE ### # Backward propagation. ### START CODE HERE ### (≈ 1 line of code) grads = L_model_backward(AL, Y, caches) ### END CODE HERE ### # Update parameters. ### START CODE HERE ### (≈ 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Print the cost every 100 training example if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters
2500的迭代次数,精度达到了80%!
小结
过程其实是很清晰的,就是先初始化参数;再开始循环,循环中先计算前向传播,得到最后一层的AL,以及每一层的cache,其中cache包括了 A_prev,W,b,Z;然后计算一下每一次迭代的cost;再进行反向传播,得到每一层的梯度dA,dW,db;记得每100次迭代记录一下cost值,这样就可以画出cost是如何下降的了。
part1构建的那些函数,一步步来是比较简单的,但是如果自己要一下子想出来的话,也很难想得到。所以思路要清晰,一步一步来,才能构建好函数!
转载地址:http://serii.baihongyu.com/