Tensorflow记录(1)

A simple neural network

def training(X, Y):
    batch_size = 8
    w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

    x = tf.placeholder(tf.float32, shape=(None, 2), name='x-input')
    y = tf.placeholder(tf.float32, shape=(None, 1), name='y-input')

    a = tf.matmul(x, w1)
    output = tf.matmul(a, w2)

    output = tf.sigmoid(output)
    loss = -tf.reduce_mean(y * tf.log(output) + (1-y) * tf.log(1-output))

    train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
    
    ###################################################################
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        
        steps = 5000
        for i in range(steps):
            start = (i + batch_size) % dataset_size
            end = min(start + batch_size, dataset_size)
            
            sess.run(train_step, feed_dict={x:X[start:end], y:Y[start:end]})
            
            if i % 1000 == 0:
                loss_ = sess.run(loss, feed_dict={x:X, y:Y})
                print("After {} trainging steps, loss on all data is {}".format(str(i), str(loss_)))

一个很简单的神经网络，但一般tensorflow训练neural network的步骤大致如此：

定义网络结构和前向传播的输出结果
定义损失函数和选择反向传播优化算法
生成会话(tf.Session)，并在训练数据集上反复运行优化算法

优化 learning rates

Learning rate 决定了参数每次更新的幅度，如果幅度过大，则可能导致参数在极优值两侧来回移动. 可以通过使学习率衰减的方式来避免这样的情况发生：

1	decayed_learning_ratee = learning_rate * decay_rate ^ (global_step / decay_steps)

其中learning_rate为初始学习率，decay_rate为衰减率，decay_steps为衰减速度，在tensorflow中可以这样实现：

tf.train.exponential_decay(
    learning_rate,初始学习率
    global_step,当前迭代次数
    decay_steps,衰减速度（在迭代到该次数时学习率衰减为earning_rate * decay_rate）
    decay_rate,学习率衰减系数，通常介于0-1之间。
    staircase=False,(默认值为False,当为True时，（global_step/decay_steps）则被转化为整数) ,选择不同的衰减方式。
    name=None
)

L1 && L2

在tensorflow中，实现L1和L2正则化非常简单：

1	loss = tf.reduce_mean(tf.square(output - y)) + tf.contrib.layers.l2_regularizer(lambda_)(weights)

但当网络结构或程序结构比较复杂时，网络结构的定义代码和计算损失函数部分的代码可能不在同一个函数内，这样通过变量这样的方式就不方便了，可以使用tensorflow中提供的collection，which can store a group of entity(like tensor) on the computation graph(tf.Graph), codes are shown below :

def get_weight(shapee, lambda):
    weight = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
    # add_to_collection 函数权重的L2正则化损失加入集合，'losses'是集合名
    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(lambda)(weight))
    
    return weight

x = tf.placeholder(tf.float32, shape=(None, 2))
y = tf.placeholder(tf.float32, shape=(None, 1))
batch_size = 8
layer_dimension = [2, 10, 10, 10, 1]
n_layers = len(layer_dimension)

# 记录当前 前向传播 到达的层
curr_layer = x
# 记录当前层的节点数(输入维度)
in_dimension = layer_dimension[0]

# 通过循环来生成一个5层FC网络
for i in range(1, n_layers):
    # 下一层节点数(输出维度)
    out_dimension = layer_dimension[i]
    weight = get_weight([in_dimension, out_dimension], 0.001)
    bias = tf.Variable(tf.constant(0.1, shape=[out_dimension]))
    
    curr_layer = tf.nn.relu(tf.matmul(cur_layer, weight) + bias)
    in_dimension = out_dimension

# 将计算刻画模型在训练数据集上表现的损失函数加入之前的collection
mse_loss = tf.reduce_mean(tf.square(curr_layer - y))    
tf.add_to_collection('losses', mse_loss)

# get_collection()返回一个包含集合中所有elements的列表，是损失函数的不部分，将它们加起来就可以得到最终的损失函数
loss = tf.add_n(tf.get_collection('losses'))

变量管理

当程序比较复杂时，在不同的函数直接传递参数是一件很麻烦的事情，tensorflow提供了通过变量名来创建或者获取一个变量的机制. 其中主要通过 tf.get_variable() 和 tf.variable_scope() 实现.

1
2
3

# 以下两个定义是等价的
var = tf.get_variable('var', shape=[1], initializer=tf.constant_initializer(1.0))
var = tf.Variable(tf.constant(1.0, shape=[1]), name='var')

对于 tf.get_variable() 来说，参数中的变量名称是必填的. tf.variable_scope() 是一个上下文管理器.

with tf.variable_scope('foo'):
    var = tf.get_variable('var', shape=[1], initializer=tf.constant_initializer(1.0))
    
# 因为在命名空间 foo 中已经存在名为 var 的变量，所以下面的代码会报错
# Variable foo/var already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
with tf.variable_scope('foo'):
    var = tf.get_variable('var', shape=[1], initializer=tf.constant_initializer(1.0))  
    
# 在生成上下文管理器时，将参数 reuse 设置为 True，这样 tf.get_variable 函数将只能获取已经创建的变量
# 因为在命名空间 bar 最终还没有创建变量 var，所以会报错
with tf.variable_scope('bar'):
    var = tf.get_variable('var')

当 tf.variable_scope() 函数使用参数 reuse=True 时，这个上下文管理内的 tf.get_variable() 函数会直接获取已经创建的变量，如果变量不存在则报错；当 reuse=False 或 reuse=None 时， tf.get_variable() 函数将创建新的变量，如果变量已存在则报错. 下面的例子用来展示一个实际的小应用：

def inference(input_tensor, resue=False):
    # 根据参数 reuse 判断创建新变量or使用已经创建好的
    # 在第一次构造网络时需要创建新变量，以后每次调用这个函数直接使用 reuse=True 就不需要每次都传 weights
    # bias 这些参数了
    with tf.variable_scope('layer1', reuse=reuse):
        weights = tf.get_variable('weights', [in_dim, out_dim1], initializer=xxx)
        bias = tf.get_variable('bias', [out_dim], initializer=xxx)
        layer1 = tf.nn.relu(tf.matmul(input_tensor, weights) + bias)
        
    with tf.variable_scope('layer2', reuse=reuse):
        weights = tf.get_variable('weights', [out_dim1, out_dim2], initializer=xxx)
        bias = tf.get_variable('bias', [out_dim2], initializer=xxx)
        layer2 = tf.nn.relu(tf.matmul(layer1, weights) + bias)
       
    return layer2