博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
DeepLearning.ai作业:(1-4)-- 深层神经网络(Deep neural networks)
阅读量:4100 次
发布时间:2019-05-25

本文共 23587 字,大约阅读时间需要 78 分钟。


title: ‘DeepLearning.ai作业:(1-4)-- 深层神经网络(Deep neural networks)’

tags:

  • homework
    categories:
  • AI
  • Deep Learning
    date: 2018-09-13 17:59:43
    id: 2018091318

首发于个人博客:,欢迎来访

  1. 不要抄作业!
  2. 我只是把思路整理了,供个人学习。
  3. 不要抄作业!

本周的作业分了两个部分,第一部分先构建神经网络的基本函数,第二部分才是构建出模型并预测。

Part1

构建的函数有:

  • Initialize the parameters
    • two-layer
    • L-layer
  • forworad propagation
    • Linear part 先构建一个线性的计算函数
    • linear->activation 在构建某一个神经元的线性和激活函数
    • L_model_forward funciton 再融合 L-1次的Relu 和 一次 的 sigmoid最后一层
  • Compute loss
  • backward propagation
    • Linear part
    • linear->activation
    • L_model_backward funciton

Initialization

初始化使用:

w : np.random.randn(shape)*0.01

b : np.zeros(shape)

1. two-layer

先写了个两层的初始化函数,上周已经写过了。

def initialize_parameters(n_x, n_h, n_y):    """    Argument:    n_x -- size of the input layer    n_h -- size of the hidden layer    n_y -- size of the output layer        Returns:    parameters -- python dictionary containing your parameters:                    W1 -- weight matrix of shape (n_h, n_x)                    b1 -- bias vector of shape (n_h, 1)                    W2 -- weight matrix of shape (n_y, n_h)                    b2 -- bias vector of shape (n_y, 1)    """        np.random.seed(1)        ### START CODE HERE ### (≈ 4 lines of code)    W1 = np.random.randn(n_h, n_x) * 0.01    b1 = np.zeros((n_h,1))    W2 = np.random.randn(n_y, n_h) * 0.01    b2 = np.zeros((n_y,1))    ### END CODE HERE ###        assert(W1.shape == (n_h, n_x))    assert(b1.shape == (n_h, 1))    assert(W2.shape == (n_y, n_h))    assert(b2.shape == (n_y, 1))        parameters = {
"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters

2. L-layer

然后写了个L层的初始化函数,其中,输入的参数是一个列表,如[12,4,3,1],表示一共4层:

def initialize_parameters_deep(layer_dims):    """    Arguments:    layer_dims -- python array (list) containing the dimensions of each layer in our network        Returns:    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])                    bl -- bias vector of shape (layer_dims[l], 1)    """        np.random.seed(3)    parameters = {
} L = len(layer_dims) # number of layers in the network for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01 parameters['b' + str(l)] = np.zeros((layer_dims[l], 1)) ### END CODE HERE ### assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1])) assert(parameters['b' + str(l)].shape == (layer_dims[l], 1)) return parameters

Forward propagation module

1. Linear Forward

利用公式:

Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]} Z[l]=W[l]A[l1]+b[l]

where A [ 0 ] = X A^{[0]} = X A[0]=X.

这个时候,输入的参数是 A,W,b,输出是计算得到的Z,以及cache=(A, W, b)保存起来

def linear_forward(A, W, b):    """    Implement the linear part of a layer's forward propagation.    Arguments:    A -- activations from previous layer (or input data): (size of previous layer, number of examples)    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)    b -- bias vector, numpy array of shape (size of the current layer, 1)    Returns:    Z -- the input of the activation function, also called pre-activation parameter     cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently    """        ### START CODE HERE ### (≈ 1 line of code)    Z = np.dot(W, A) + b    ### END CODE HERE ###        assert(Z.shape == (W.shape[0], A.shape[1]))    cache = (A, W, b)        return Z, cache

2. Linear-Activation Forward

在这里就是把刚才得到的Z,通过 A = g ( Z ) A = g(Z) A=g(Z)激活函数,合并成一个

这个时候,notebook已经给了我们现成的sigmoid和relu函数了,只要调用就行,不过在里面好像没有说明源代码,输出都是A和cache=Z,这里贴出来:

def sigmoid(Z):    """    Implements the sigmoid activation in numpy    Arguments:    Z -- numpy array of any shape    Returns:    A -- output of sigmoid(z), same shape as Z    cache -- returns Z as well, useful during backpropagation    """    A = 1/(1+np.exp(-Z))    cache = Z    return A, cache
def relu(Z):    """    Implement the RELU function.    Arguments:    Z -- Output of the linear layer, of any shape    Returns:    A -- Post-activation parameter, of the same shape as Z    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently    """    A = np.maximum(0,Z)    assert(A.shape == Z.shape)    cache = Z     return A, cache

而后利用之前的linear_forward,可以写出某层神经元的前向函数了,输入是 A [ l − 1 ] , W , b A^{[l-1]},W,b A[l1],W,b,还有一个是说明sigmoid还是relu的字符串activation。

输出是 A [ l ] A^{[l]} A[l]和cache,这里的cache已经包含的4个参数了,分别是 A [ l − 1 ] , W [ l ] , b [ l ] , Z [ l ] A^{[l-1]},W^{[l]},b^{[l]},Z^{[l]} A[l1],W[l],b[l],Z[l]

# GRADED FUNCTION: linear_activation_forwarddef linear_activation_forward(A_prev, W, b, activation):    """    Implement the forward propagation for the LINEAR->ACTIVATION layer    Arguments:    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)    b -- bias vector, numpy array of shape (size of the current layer, 1)    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"    Returns:    A -- the output of the activation function, also called the post-activation value     cache -- a python dictionary containing "linear_cache" and "activation_cache";             stored for computing the backward pass efficiently    """        if activation == "sigmoid":        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".        ### START CODE HERE ### (≈ 2 lines of code)        Z, linear_cache = linear_forward(A_prev, W, b)        A, activation_cache = sigmoid(Z)        ### END CODE HERE ###        elif activation == "relu":        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".        ### START CODE HERE ### (≈ 2 lines of code)        Z, linear_cache = linear_forward(A_prev, W, b)        A, activation_cache = relu(Z)        ### END CODE HERE ###        assert (A.shape == (W.shape[0], A_prev.shape[1]))    cache = (linear_cache, activation_cache)   # print(cache)    return A, cache

3. L-Layer Model

这一步就把多层的神经网络从头到尾串起来了。前面有L-1层的Relu,第L层是sigmoid。

输入是X,也就是 A [ 0 ] A^{[0]} A[0],和 parameters包含了各个层的W,b

输出是最后一层的 A [ L ] A^{[L]} A[L],也就是预测结果 Y h a t Y_hat Yhat,以及每一层的caches : A [ l − 1 ] , W [ l ] , b [ l ] , Z [ l ] A^{[l-1]},W^{[l]},b^{[l]},Z^{[l]} A[l1],W[l],b[l],Z[l]

def L_model_forward(X, parameters):    """    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation        Arguments:    X -- data, numpy array of shape (input size, number of examples)    parameters -- output of initialize_parameters_deep()        Returns:    AL -- last post-activation value    caches -- list of caches containing:                every cache of linear_activation_forward() (there are L-1 of them, indexed from 0 to L-1)    """    caches = []    A = X    L = len(parameters) // 2                  # number of layers in the neural network        # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.    for l in range(1, L):        A_prev = A         ### START CODE HERE ### (≈ 2 lines of code)        A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], 'relu')        caches.append(cache)        ### END CODE HERE ###        # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.    ### START CODE HERE ### (≈ 2 lines of code)    AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)],'sigmoid')    caches.append(cache)    ### END CODE HERE ###   # print(AL.shape)    assert(AL.shape == (1,X.shape[1]))                return AL, caches

Cost function

− 1 m ∑ i = 1 m ( y ( i ) log ⁡ ( a [ L ] ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − a [ L ] ( i ) ) ) -\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right)) m1i=1m(y(i)log(a[L](i))+(1y(i))log(1a[L](i)))

利用np.multiply and np.sum求得交叉熵

def compute_cost(AL, Y):    """    Implement the cost function defined by equation (7).    Arguments:    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)    Returns:    cost -- cross-entropy cost    """        m = Y.shape[1]    # Compute loss from aL and y.    ### START CODE HERE ### (≈ 1 lines of code)    cost = - np.sum(np.multiply(Y,np.log(AL)) + np.multiply(1-Y,np.log(1-AL))) / m    print(cost)    ### END CODE HERE ###    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).    assert(cost.shape == ())        return cost

Backward propagation module

1. Linear backward

首先假设知道 d Z [ l ] = ∂ L ∂ Z [ l ] dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} dZ[l]=Z[l]L,然后想要求得的是 ( d W [ l ] , d b [ l ] d A [ l − 1 ] ) (dW^{[l]}, db^{[l]} dA^{[l-1]}) (dW[l],db[l]dA[l1]).

公式已经给你了:

d W [ l ] = ∂ L ∂ W [ l ] = 1 m d Z [ l ] A [ l − 1 ] T dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} dW[l]=W[l]L=m1dZ[l]A[l1]T

d b [ l ] = ∂ L ∂ b [ l ] = 1 m ∑ i = 1 m d Z [ l ] ( i ) db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l] (i)} db[l]=b[l]L=m1i=1mdZ[l](i)

d A [ l − 1 ] = ∂ L ∂ A [ l − 1 ] = W [ l ] T d Z [ l ] dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} dA[l1]=A[l1]L=W[l]TdZ[l]

cache是linear cache: A_prev,W,b

def linear_backward(dZ, cache):    """    Implement the linear portion of backward propagation for a single layer (layer l)    Arguments:    dZ -- Gradient of the cost with respect to the linear output (of current layer l)    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer    Returns:    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev    dW -- Gradient of the cost with respect to W (current layer l), same shape as W    db -- Gradient of the cost with respect to b (current layer l), same shape as b    """    A_prev, W, b = cache    m = A_prev.shape[1]    ### START CODE HERE ### (≈ 3 lines of code)    dW = 1 / m * np.dot(dZ, A_prev.T)    db = 1 / m * np.sum(dZ, axis=1,keepdims=True)    #print(db.shape)    #print(b.shape)    dA_prev = np.dot(W.T, dZ)    ### END CODE HERE ###        assert (dA_prev.shape == A_prev.shape)    assert (dW.shape == W.shape)    assert (db.shape == b.shape)        return dA_prev, dW, db

2. Linear-Activation backward

dA通过激活函数的导数可以求得dZ,再由上面的函数,最终:

输入 d A [ l ] , c a c h e dA^{[l]} , cache dA[l],cache

输出 d A [ l − 1 ] , d W , d b dA^{[l-1]} ,dW,db dA[l1],dW,db

这个时候它有给了两个现成的函数dZ = sigmoid_backward(dA, activation_cache)dZ = relu_backward(dA, activation_cache)

源代码如下,输入的都是dA,和 cache=Z,输出是dZ:

d Z [ l ] = d A [ l ] ∗ g ′ ( Z [ l ] ) dZ^{[l]} = dA^{[l]} * g'(Z^{[l]}) dZ[l]=dA[l]g(Z[l])

def sigmoid_backward(dA, cache):    """    Implement the backward propagation for a single SIGMOID unit.    Arguments:    dA -- post-activation gradient, of any shape    cache -- 'Z' where we store for computing backward propagation efficiently    Returns:    dZ -- Gradient of the cost with respect to Z    """    Z = cache    s = 1/(1+np.exp(-Z))    dZ = dA * s * (1-s)    assert (dZ.shape == Z.shape)    return dZ
def relu_backward(dA, cache):    """    Implement the backward propagation for a single RELU unit.    Arguments:    dA -- post-activation gradient, of any shape    cache -- 'Z' where we store for computing backward propagation efficiently    Returns:    dZ -- Gradient of the cost with respect to Z    """    Z = cache    dZ = np.array(dA, copy=True) # just converting dz to a correct object.    # When z <= 0, you should set dz to 0 as well.     dZ[Z <= 0] = 0    assert (dZ.shape == Z.shape)    return dZ

然后得到了函数如下,注意这里面的cache已经是4个元素了linear_cache=A_prev,W,bactivation_cache=Z

# GRADED FUNCTION: linear_activation_backwarddef linear_activation_backward(dA, cache, activation):    """    Implement the backward propagation for the LINEAR->ACTIVATION layer.        Arguments:    dA -- post-activation gradient for current layer l     cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"        Returns:    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev    dW -- Gradient of the cost with respect to W (current layer l), same shape as W    db -- Gradient of the cost with respect to b (current layer l), same shape as b    """    linear_cache, activation_cache = cache        if activation == "relu":        ### START CODE HERE ### (≈ 2 lines of code)        dZ = relu_backward(dA, activation_cache)        dA_prev, dW, db = linear_backward(dZ, linear_cache)        ### END CODE HERE ###            elif activation == "sigmoid":        ### START CODE HERE ### (≈ 2 lines of code)        dZ = sigmoid_backward(dA, activation_cache)        dA_prev, dW, db = linear_backward(dZ, linear_cache)        ### END CODE HERE ###        return dA_prev, dW, db

3. L-Model Backward

可以把前面的函数穿起来,从后面往前面传播了,先算最后一层的sigmoid,然后往前算L-1的循环relu。其中,dAL是损失函数的导数,这个是预先求得知道的,也就是

− y a − 1 − y 1 − a -\frac{y}{a}-\frac{1-y}{1-a} ay1a1y

numpy表示为:

dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))

整个backward中,我们的输入只有AL,Y和caches,

输出则是每一层的grads,包括了 d A , d W , d b dA,dW,db dA,dW,db

# GRADED FUNCTION: L_model_backwarddef L_model_backward(AL, Y, caches):    """    Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group        Arguments:    AL -- probability vector, output of the forward propagation (L_model_forward())    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)    caches -- list of caches containing:                every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)                the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])        Returns:    grads -- A dictionary with the gradients             grads["dA" + str(l)] = ...              grads["dW" + str(l)] = ...             grads["db" + str(l)] = ...     """    grads = {
} L = len(caches) # the number of layers m = AL.shape[1] Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL # Initializing the backpropagation ### START CODE HERE ### (1 line of code) dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) ### END CODE HERE ### # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"] ### START CODE HERE ### (approx. 2 lines) current_cache = caches[L-1] grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, 'sigmoid') ### END CODE HERE ### # Loop from l=L-2 to l=0 for l in reversed(range(L-1)): # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 1)], current_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines) current_cache = caches[l] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads['dA'+str(l+1)], current_cache, 'relu') grads["dA" + str(l)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp ### END CODE HERE ### return grads

Update Parameters

# GRADED FUNCTION: update_parametersdef update_parameters(parameters, grads, learning_rate):    """    Update parameters using gradient descent        Arguments:    parameters -- python dictionary containing your parameters     grads -- python dictionary containing your gradients, output of L_model_backward        Returns:    parameters -- python dictionary containing your updated parameters                   parameters["W" + str(l)] = ...                   parameters["b" + str(l)] = ...    """        L = len(parameters) // 2 # number of layers in the neural network    # Update rule for each parameter. Use a for loop.    ### START CODE HERE ### (≈ 3 lines of code)    for l in range(L):        parameters["W" + str(l+1)] -= learning_rate * grads['dW'+str(l+1)]        parameters["b" + str(l+1)] -= learning_rate * grads['db'+str(l+1)]    ### END CODE HERE ###    return parameters

Part2

有了part1中的函数,就很容易在part2中搭建模型和训练了。

依旧是识别猫猫的图片。

开始先用两层的layer做训练,得到了精确度是72%,这里贴代码就好了,L层再详细说说

### CONSTANTS DEFINING THE MODEL ####n_x = 12288     # num_px * num_px * 3n_h = 7n_y = 1layers_dims = (n_x, n_h, n_y)# GRADED FUNCTION: two_layer_modeldef two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):    """    Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.        Arguments:    X -- input data, of shape (n_x, number of examples)    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)    layers_dims -- dimensions of the layers (n_x, n_h, n_y)    num_iterations -- number of iterations of the optimization loop    learning_rate -- learning rate of the gradient descent update rule    print_cost -- If set to True, this will print the cost every 100 iterations         Returns:    parameters -- a dictionary containing W1, W2, b1, and b2    """        np.random.seed(1)    grads = {
} costs = [] # to keep track of the cost m = X.shape[1] # number of examples (n_x, n_h, n_y) = layers_dims # Initialize parameters dictionary, by calling one of the functions you'd previously implemented ### START CODE HERE ### (≈ 1 line of code) parameters = initialize_parameters(n_x, n_h, n_y) ### END CODE HERE ### # Get W1, b1, W2 and b2 from the dictionary parameters. W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1, W2, b2". Output: "A1, cache1, A2, cache2". ### START CODE HERE ### (≈ 2 lines of code) A1, cache1 = linear_activation_forward(X, W1, b1, 'relu') A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid') ### END CODE HERE ### # Compute cost ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(A2, Y) ### END CODE HERE ### # Initializing backward propagation dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2)) # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1". ### START CODE HERE ### (≈ 2 lines of code) dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid') dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu') ### END CODE HERE ### # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2 grads['dW1'] = dW1 grads['db1'] = db1 grads['dW2'] = dW2 grads['db2'] = db2 # Update parameters. ### START CODE HERE ### (approx. 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Retrieve W1, b1, W2, b2 from parameters W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Print the cost every 100 training example if print_cost and i % 100 == 0: print("Cost after iteration {}: {}".format(i, np.squeeze(cost))) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters

L-layer Neural Network

使用之前的函数:

def initialize_parameters_deep(layers_dims):    ...    return parameters def L_model_forward(X, parameters):    ...    return AL, cachesdef compute_cost(AL, Y):    ...    return costdef L_model_backward(AL, Y, caches):    ...    return gradsdef update_parameters(parameters, grads, learning_rate):    ...    return parameters

这里一共4层:

layers_dims = [12288, 20, 7, 5, 1] #  4-layer model

思路是:

  1. 初始化参数
  2. 进入for的n次迭代循环:
    1. L_model_forward(X, parameters) 得到 AL,caches
    2. 计算cost
    3. L_model_backward(AL, Y, caches)计算grads
    4. update_parameters(parameters, grads, learning_rate)更新参数
    5. 每100层记录一下cost的值
  3. 画出cost梯度下降图
# GRADED FUNCTION: L_layer_modeldef L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009    """    Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.        Arguments:    X -- data, numpy array of shape (number of examples, num_px * num_px * 3)    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)    layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).    learning_rate -- learning rate of the gradient descent update rule    num_iterations -- number of iterations of the optimization loop    print_cost -- if True, it prints the cost every 100 steps        Returns:    parameters -- parameters learnt by the model. They can then be used to predict.    """    np.random.seed(1)    costs = []                         # keep track of cost        # Parameters initialization. (≈ 1 line of code)    ### START CODE HERE ###    parameters = initialize_parameters_deep(layers_dims)    ### END CODE HERE ###        # Loop (gradient descent)    for i in range(0, num_iterations):        # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.        ### START CODE HERE ### (≈ 1 line of code)        AL, caches = L_model_forward(X, parameters)        ### END CODE HERE ###                # Compute cost.        ### START CODE HERE ### (≈ 1 line of code)        cost = compute_cost(AL, Y)        ### END CODE HERE ###            # Backward propagation.        ### START CODE HERE ### (≈ 1 line of code)        grads = L_model_backward(AL, Y, caches)        ### END CODE HERE ###         # Update parameters.        ### START CODE HERE ### (≈ 1 line of code)        parameters = update_parameters(parameters, grads, learning_rate)        ### END CODE HERE ###                        # Print the cost every 100 training example        if print_cost and i % 100 == 0:            print ("Cost after iteration %i: %f" %(i, cost))        if print_cost and i % 100 == 0:            costs.append(cost)                # plot the cost    plt.plot(np.squeeze(costs))    plt.ylabel('cost')    plt.xlabel('iterations (per tens)')    plt.title("Learning rate =" + str(learning_rate))    plt.show()        return parameters

2500的迭代次数,精度达到了80%!

小结

过程其实是很清晰的,就是先初始化参数;再开始循环,循环中先计算前向传播,得到最后一层的AL,以及每一层的cache,其中cache包括了 A_prev,W,b,Z;然后计算一下每一次迭代的cost;再进行反向传播,得到每一层的梯度dA,dW,db;记得每100次迭代记录一下cost值,这样就可以画出cost是如何下降的了。

part1构建的那些函数,一步步来是比较简单的,但是如果自己要一下子想出来的话,也很难想得到。所以思路要清晰,一步一步来,才能构建好函数!

转载地址:http://serii.baihongyu.com/

你可能感兴趣的文章
9 款你不能错过的 JSON 工具
查看>>
就在昨天,全球 42 亿 IPv4 地址宣告耗尽!
查看>>
200页!分享珍藏很久的Python学习知识手册(附链接)
查看>>
程序员之神
查看>>
4 岁小女孩给 Linux 内核贡献提交
查看>>
推荐几个私藏很久的技术公众号给大家
查看>>
王垠受邀面试阿里 P9,被 P10 面跪后网上怒发文,惨打 325 的 P10 赵海平回应了!...
查看>>
Python 趣味打怪:147 段简单代码助你从入门到大师
查看>>
卧槽!小姐姐用动画图解 Git 命令,这也太秀了吧?!
查看>>
厉害了!Python 编辑器界的神器 Jupyter ,推出官方可视化 Debug 工具!
查看>>
卧槽!Java 虚拟机竟然还有这些性能调优技巧...
查看>>
听说玩这些游戏能提升编程能力?
查看>>
7 年工作经验,面试官竟然还让我写算法题???
查看>>
被 Zoom 逼疯的歪果仁,造出了视频会议机器人,同事已笑疯丨开源
查看>>
上古语言从入门到精通:COBOL 教程登上 GitHub 热榜
查看>>
再见,Eclipse...
查看>>
超全汇总!B 站上有哪些值得学习的 AI 课程...
查看>>
如果你还不了解 RTC,那我强烈建议你看看这个!
查看>>
沙雕程序员在无聊的时候,都搞出了哪些好玩的小玩意...
查看>>
程序员用 AI 修复百年前的老北京视频后,火了!
查看>>