摘要: 本文主要和大家分享如何使用Tensorflow從頭開始構(gòu)建和訓(xùn)練卷積神經(jīng)網(wǎng)絡(luò)。這樣就可以將這個(gè)知識(shí)作為一個(gè)構(gòu)建塊來(lái)創(chuàng)造有趣的深度學(xué)習(xí)應(yīng)用程序了。
0. 簡(jiǎn)介
在過(guò)去,我寫的主要都是“傳統(tǒng)類”的機(jī)器學(xué)習(xí)文章,如樸素貝葉斯分類、邏輯回歸和Perceptron算法。在過(guò)去的一年中,我一直在研究深度學(xué)習(xí)技術(shù),因此,我想和大家分享一下如何使用Tensorflow從頭開始構(gòu)建和訓(xùn)練卷積神經(jīng)網(wǎng)絡(luò)。這樣,我們以后就可以將這個(gè)知識(shí)作為一個(gè)構(gòu)建塊來(lái)創(chuàng)造有趣的深度學(xué)習(xí)應(yīng)用程序了。
為此,你需要安裝Tensorflow(請(qǐng)參閱安裝說(shuō)明),你還應(yīng)該對(duì)Python編程和卷積神經(jīng)網(wǎng)絡(luò)背后的理論有一個(gè)基本的了解。安裝完Tensorflow之后,你可以在不依賴GPU的情況下運(yùn)行一個(gè)較小的神經(jīng)網(wǎng)絡(luò),但對(duì)于更深層次的神經(jīng)網(wǎng)絡(luò),就需要用到GPU的計(jì)算能力了。
在互聯(lián)網(wǎng)上有很多解釋卷積神經(jīng)網(wǎng)絡(luò)工作原理方面的網(wǎng)站和課程,其中有一些還是很不錯(cuò)的,圖文并茂、易于理解[]。我在這里就不再解釋相同的東西,所以在開始閱讀下文之前,請(qǐng)?zhí)崆傲私饩矸e神經(jīng)網(wǎng)絡(luò)的工作原理。例如:
什么是卷積層,卷積層的過(guò)濾器是什么?
什么是激活層(ReLu層(應(yīng)用最廣泛的)、S型激活或tanh)?
什么是池層(最大池/平均池),什么是dropout?
隨機(jī)梯度下降的工作原理是什么?
本文內(nèi)容如下:
Tensorflow基礎(chǔ)
1.1 常數(shù)和變量
1.2 Tensorflow中的圖和會(huì)話
1.3 占位符和feed_dicts
Tensorflow中的神經(jīng)網(wǎng)絡(luò)
2.1 介紹
2.2 數(shù)據(jù)加載
2.3 創(chuàng)建一個(gè)簡(jiǎn)單的一層神經(jīng)網(wǎng)絡(luò)
2.4 Tensorflow的多個(gè)方面
2.5 創(chuàng)建LeNet5卷積神經(jīng)網(wǎng)絡(luò)
2.6 影響層輸出大小的參數(shù)
2.7 調(diào)整LeNet5架構(gòu)
2.8 學(xué)習(xí)速率和優(yōu)化器的影響
Tensorflow中的深度神經(jīng)網(wǎng)絡(luò)
3.1 AlexNet
3.2 VGG Net-16
3.3 AlexNet性能
結(jié)語(yǔ)
1. Tensorflow 基礎(chǔ)
在這里,我將向以前從未使用過(guò)Tensorflow的人做一個(gè)簡(jiǎn)單的介紹。如果你想要立即開始構(gòu)建神經(jīng)網(wǎng)絡(luò),或者已經(jīng)熟悉Tensorflow,可以直接跳到第2節(jié)。如果你想了解更多有關(guān)Tensorflow的信息,你還可以查看這個(gè)代碼庫(kù),或者閱讀斯坦福大學(xué)CS20SI課程的講義1和講義2。
1.1 常量與變量
Tensorflow中最基本的單元是常量、變量和占位符。
tf.constant()和tf.Variable()之間的區(qū)別很清楚;一個(gè)常量有著恒定不變的值,一旦設(shè)置了它,它的值不能被改變。而變量的值可以在設(shè)置完成后改變,但變量的數(shù)據(jù)類型和形狀無(wú)法改變。
#We can create constants and variables of different types.
#However, the different types do not mix well together.
a = tf.constant(2, tf.int16)
b = tf.constant(4, tf.float32)
c = tf.constant(8, tf.float32)
d = tf.Variable(2, tf.int16)
e = tf.Variable(4, tf.float32)
f = tf.Variable(8, tf.float32)
#we can perform computations on variable of the same type: e + f
#but the following can not be done: d + e
#everything in Tensorflow is a tensor, these can have different dimensions:
#0D, 1D, 2D, 3D, 4D, or nD-tensors
g = tf.constant(np.zeros(shape=(2,2), dtype=np.float32)) #does work
h = tf.zeros([11], tf.int16)
i = tf.ones([2,2], tf.float32)
j = tf.zeros([1000,4,3], tf.float64)
k = tf.Variable(tf.zeros([2,2], tf.float32))
l = tf.Variable(tf.zeros([5,6,5], tf.float32))
除了tf.zeros()和tf.ones()能夠創(chuàng)建一個(gè)初始值為0或1的張量(見這里)之外,還有一個(gè)tf.random_normal()函數(shù),它能夠創(chuàng)建一個(gè)包含多個(gè)隨機(jī)值的張量,這些隨機(jī)值是從正態(tài)分布中隨機(jī)抽取的(默認(rèn)的分布均值為0.0,標(biāo)準(zhǔn)差為1.0)。
另外還有一個(gè)tf.truncated_normal()函數(shù),它創(chuàng)建了一個(gè)包含從截?cái)嗟恼龖B(tài)分布中隨機(jī)抽取的值的張量,其中下上限是標(biāo)準(zhǔn)偏差的兩倍。
有了這些知識(shí),我們就可以創(chuàng)建用于神經(jīng)網(wǎng)絡(luò)的權(quán)重矩陣和偏差向量了。
weights = tf.Variable(tf.truncated_normal([256 * 256, 10]))
biases = tf.Variable(tf.zeros([10]))
print(weights.get_shape().as_list())
print(biases.get_shape().as_list())
>>>[65536, 10]
>>>[10]
1.2 Tensorflow 中的圖與會(huì)話
在Tensorflow中,所有不同的變量以及對(duì)這些變量的操作都保存在圖(Graph)中。在構(gòu)建了一個(gè)包含針對(duì)模型的所有計(jì)算步驟的圖之后,就可以在會(huì)話(Session)中運(yùn)行這個(gè)圖了。會(huì)話可以跨CPU和GPU分配所有的計(jì)算。
graph = tf.Graph()
with graph.as_default():
a = tf.Variable(8, tf.float32)
b = tf.Variable(tf.zeros([2,2], tf.float32))
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print(f)
print(session.run(f))
print(session.run(k))
>>>
>>> 8
>>> [[ 0. 0.]
>>> [ 0. 0.]]
1.3 占位符 與 feed_dicts
我們已經(jīng)看到了用于創(chuàng)建常量和變量的各種形式。Tensorflow中也有占位符,它不需要初始值,僅用于分配必要的內(nèi)存空間。 在一個(gè)會(huì)話中,這些占位符可以通過(guò)feed_dict填入(外部)數(shù)據(jù)。
以下是占位符的使用示例。
list_of_points1_ = [[1,2], [3,4], [5,6], [7,8]]
list_of_points2_ = [[15,16], [13,14], [11,12], [9,10]]
list_of_points1 = np.array([np.array(elem).reshape(1,2) for elem in list_of_points1_])
list_of_points2 = np.array([np.array(elem).reshape(1,2) for elem in list_of_points2_])
graph = tf.Graph()
with graph.as_default():
#we should use a tf.placeholder() to create a variable whose value you will fill in later (during session.run()).
#this can be done by 'feeding' the data into the placeholder.
#below we see an example of a method which uses two placeholder arrays of size [2,1] to calculate the eucledian distance
point1 = tf.placeholder(tf.float32, shape=(1, 2))
point2 = tf.placeholder(tf.float32, shape=(1, 2))
def calculate_eucledian_distance(point1, point2):
difference = tf.subtract(point1, point2)
power2 = tf.pow(difference, tf.constant(2.0, shape=(1,2)))
add = tf.reduce_sum(power2)
eucledian_distance = tf.sqrt(add)
return eucledian_distance
dist = calculate_eucledian_distance(point1, point2)
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
for ii in range(len(list_of_points1)):
point1_ = list_of_points1[ii]
point2_ = list_of_points2[ii]
feed_dict = {point1 : point1_, point2 : point2_}
distance = session.run([dist], feed_dict=feed_dict)
print("the distance between {} and {} -> {}".format(point1_, point2_, distance))
>>> the distance between [[1 2]] and [[15 16]] -> [19.79899]
>>> the distance between [[3 4]] and [[13 14]] -> [14.142136]
>>> the distance between [[5 6]] and [[11 12]] -> [8.485281]
>>> the distance between [[7 8]] and [[ 9 10]] -> [2.8284271]
2. Tensorflow 中的神經(jīng)網(wǎng)絡(luò)
2.1 簡(jiǎn)介
?
包含神經(jīng)網(wǎng)絡(luò)的圖(如上圖所示)應(yīng)包含以下步驟:
1. 輸入數(shù)據(jù)集:訓(xùn)練數(shù)據(jù)集和標(biāo)簽、測(cè)試數(shù)據(jù)集和標(biāo)簽(以及驗(yàn)證數(shù)據(jù)集和標(biāo)簽)。 測(cè)試和驗(yàn)證數(shù)據(jù)集可以放在tf.constant()中。而訓(xùn)練數(shù)據(jù)集被放在tf.placeholder()中,這樣它可以在訓(xùn)練期間分批輸入(隨機(jī)梯度下降)。
2. 神經(jīng)網(wǎng)絡(luò)**模型**及其所有的層。這可以是一個(gè)簡(jiǎn)單的完全連接的神經(jīng)網(wǎng)絡(luò),僅由一層組成,或者由5、9、16層組成的更復(fù)雜的神經(jīng)網(wǎng)絡(luò)。
3. 權(quán)重矩陣和**偏差矢量**以適當(dāng)?shù)男螤钸M(jìn)行定義和初始化。(每層一個(gè)權(quán)重矩陣和偏差矢量)
4. 損失值:模型可以輸出分對(duì)數(shù)矢量(估計(jì)的訓(xùn)練標(biāo)簽),并通過(guò)將分對(duì)數(shù)與實(shí)際標(biāo)簽進(jìn)行比較,計(jì)算出損失值(具有交叉熵函數(shù)的softmax)。損失值表示估計(jì)訓(xùn)練標(biāo)簽與實(shí)際訓(xùn)練標(biāo)簽的接近程度,并用于更新權(quán)重值。
5. 優(yōu)化器:它用于將計(jì)算得到的損失值來(lái)更新反向傳播算法中的權(quán)重和偏差。
2.2 數(shù)據(jù)加載
下面我們來(lái)加載用于訓(xùn)練和測(cè)試神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)集。為此,我們要下載MNIST和CIFAR-10數(shù)據(jù)集。 MNIST數(shù)據(jù)集包含了6萬(wàn)個(gè)手寫數(shù)字圖像,其中每個(gè)圖像大小為28 x 28 x 1(灰度)。 CIFAR-10數(shù)據(jù)集也包含了6萬(wàn)個(gè)圖像(3個(gè)通道),大小為32 x 32 x 3,包含10個(gè)不同的物體(飛機(jī)、汽車、鳥、貓、鹿、狗、青蛙、馬、船、卡車)。 由于兩個(gè)數(shù)據(jù)集中都有10個(gè)不同的對(duì)象,所以這兩個(gè)數(shù)據(jù)集都包含10個(gè)標(biāo)簽。
?
首先,我們來(lái)定義一些方便載入數(shù)據(jù)和格式化數(shù)據(jù)的方法。
def randomize(dataset, labels):
permutation = np.random.permutation(labels.shape[0])
shuffled_dataset = dataset[permutation, :, :]
shuffled_labels = labels[permutation]
return shuffled_dataset, shuffled_labels
def one_hot_encode(np_array):
return (np.arange(10) == np_array[:,None]).astype(np.float32)
def reformat_data(dataset, labels, image_width, image_height, image_depth):
np_dataset_ = np.array([np.array(image_data).reshape(image_width, image_height, image_depth) for image_data in dataset])
np_labels_ = one_hot_encode(np.array(labels, dtype=np.float32))
np_dataset, np_labels = randomize(np_dataset_, np_labels_)
return np_dataset, np_labels
def flatten_tf_array(array):
shape = array.get_shape().as_list()
return tf.reshape(array, [shape[0], shape[1] shape[2] shape[3]])
def accuracy(predictions, labels):
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])
這些方法可用于對(duì)標(biāo)簽進(jìn)行獨(dú)熱碼編碼、將數(shù)據(jù)加載到隨機(jī)數(shù)組中、扁平化矩陣(因?yàn)橥耆B接的網(wǎng)絡(luò)需要一個(gè)扁平矩陣作為輸入):
在我們定義了這些必要的函數(shù)之后,我們就可以這樣加載MNIST和CIFAR-10數(shù)據(jù)集了:
mnist_folder = './data/mnist/'
mnist_image_width = 28
mnist_image_height = 28
mnist_image_depth = 1
mnist_num_labels = 10
mndata = MNIST(mnist_folder)
mnist_train_dataset_, mnist_train_labels_ = mndata.load_training()
mnist_test_dataset_, mnist_test_labels_ = mndata.load_testing()
mnist_train_dataset, mnist_train_labels = reformat_data(mnist_train_dataset_, mnist_train_labels_, mnist_image_size, mnist_image_size, mnist_image_depth)
mnist_test_dataset, mnist_test_labels = reformat_data(mnist_test_dataset_, mnist_test_labels_, mnist_image_size, mnist_image_size, mnist_image_depth)
print("There are {} images, each of size {}".format(len(mnist_train_dataset), len(mnist_train_dataset[0])))
print("Meaning each image has the size of 28281 = {}".format(mnist_image_sizemnist_image_size1))
print("The training set contains the following {} labels: {}".format(len(np.unique(mnist_train_labels_)), np.unique(mnist_train_labels_)))
print('Training set shape', mnist_train_dataset.shape, mnist_train_labels.shape)
print('Test set shape', mnist_test_dataset.shape, mnist_test_labels.shape)
train_dataset_mnist, train_labels_mnist = mnist_train_dataset, mnist_train_labels
test_dataset_mnist, test_labels_mnist = mnist_test_dataset, mnist_test_labels
######################################################################################
cifar10_folder = './data/cifar10/'
train_datasets = ['data_batch_1', 'data_batch_2', 'data_batch_3', 'data_batch_4', 'data_batch_5', ]
test_dataset = ['test_batch']
c10_image_height = 32
c10_image_width = 32
c10_image_depth = 3
c10_num_labels = 10
with open(cifar10_folder + test_dataset[0], 'rb') as f0:
c10_test_dict = pickle.load(f0, encoding='bytes')
c10_test_dataset, c10_test_labels = c10_test_dict[b'data'], c10_test_dict[b'labels']
test_dataset_cifar10, test_labels_cifar10 = reformat_data(c10_test_dataset, c10_test_labels, c10_image_size, c10_image_size, c10_image_depth)
c10_train_dataset, c10_train_labels = [], []
for train_dataset in train_datasets:
with open(cifar10_folder + train_dataset, 'rb') as f0:
c10_train_dict = pickle.load(f0, encoding='bytes')
c10_train_dataset_, c10_train_labels_ = c10_train_dict[b'data'], c10_train_dict[b'labels']
c10_train_dataset.append(c10_train_dataset_)
c10_train_labels += c10_train_labels_
c10_train_dataset = np.concatenate(c10_train_dataset, axis=0)
train_dataset_cifar10, train_labels_cifar10 = reformat_data(c10_train_dataset, c10_train_labels, c10_image_size, c10_image_size, c10_image_depth)
del c10_train_dataset
del c10_train_labels
print("The training set contains the following labels: {}".format(np.unique(c10_train_dict[b'labels'])))
print('Training set shape', train_dataset_cifar10.shape, train_labels_cifar10.shape)
print('Test set shape', test_dataset_cifar10.shape, test_labels_cifar10.shape)
你可以從Yann LeCun的網(wǎng)站下載MNIST數(shù)據(jù)集。下載并解壓縮之后,可以使用python-mnist?工具來(lái)加載數(shù)據(jù)。 CIFAR-10數(shù)據(jù)集可以從這里下載。
2.3 創(chuàng)建一個(gè)簡(jiǎn)單的一層神經(jīng)網(wǎng)絡(luò)
神經(jīng)網(wǎng)絡(luò)最簡(jiǎn)單的形式是一層線性全連接神經(jīng)網(wǎng)絡(luò)(FCNN, Fully Connected Neural Network)。 在數(shù)學(xué)上它由一個(gè)矩陣乘法組成。
最好是在Tensorflow中從這樣一個(gè)簡(jiǎn)單的NN開始,然后再去研究更復(fù)雜的神經(jīng)網(wǎng)絡(luò)。 當(dāng)我們研究那些更復(fù)雜的神經(jīng)網(wǎng)絡(luò)的時(shí)候,只是圖的模型(步驟2)和權(quán)重(步驟3)發(fā)生了改變,其他步驟仍然保持不變。
我們可以按照如下代碼制作一層FCNN:
image_width = mnist_image_width
image_height = mnist_image_height
image_depth = mnist_image_depth
num_labels = mnist_num_labels
#the dataset
train_dataset = mnist_train_dataset
train_labels = mnist_train_labels
test_dataset = mnist_test_dataset
test_labels = mnist_test_labels
#number of iterations and learning rate
num_steps = 10001
display_step = 1000
learning_rate = 0.5
graph = tf.Graph()
with graph.as_default():
#1) First we put the input data in a Tensorflow friendly form.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))
tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
tf_test_dataset = tf.constant(test_dataset, tf.float32)
#2) Then, the weight matrices and bias vectors are initialized
#as a default, tf.truncated_normal() is used for the weight matrix and tf.zeros() is used for the bias vector.
weights = tf.Variable(tf.truncated_normal([image_width image_height image_depth, num_labels]), tf.float32)
bias = tf.Variable(tf.zeros([num_labels]), tf.float32)
#3) define the model:
#A one layered fccd simply consists of a matrix multiplication
def model(data, weights, bias):
return tf.matmul(flatten_tf_array(data), weights) + bias
logits = model(tf_train_dataset, weights, bias)
#4) calculate the loss, which will be used in the optimization of the weights
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
#5) Choose an optimizer. Many are available.
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
#6) The predicted values for the images in the train dataset and test dataset are assigned to the variables train_prediction and test_prediction.
#It is only necessary if you want to know the accuracy by comparing it with the actual values.
train_prediction = tf.nn.softmax(logits)
test_prediction = tf.nn.softmax(model(tf_test_dataset, weights, bias))
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for step in range(num_steps):
_, l, predictions = session.run([optimizer, loss, train_prediction])
if (step % display_step == 0):
train_accuracy = accuracy(predictions, train_labels[:, :])
test_accuracy = accuracy(test_prediction.eval(), test_labels)
message = "step {:04d} : loss is {:06.2f}, accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)
print(message)
>>> Initialized
>>> step 0000 : loss is 2349.55, accuracy on training set 10.43 %, accuracy on test set 34.12 %
>>> step 0100 : loss is 3612.48, accuracy on training set 89.26 %, accuracy on test set 90.15 %
>>> step 0200 : loss is 2634.40, accuracy on training set 91.10 %, accuracy on test set 91.26 %
>>> step 0300 : loss is 2109.42, accuracy on training set 91.62 %, accuracy on test set 91.56 %
>>> step 0400 : loss is 2093.56, accuracy on training set 91.85 %, accuracy on test set 91.67 %
>>> step 0500 : loss is 2325.58, accuracy on training set 91.83 %, accuracy on test set 91.67 %
>>> step 0600 : loss is 22140.44, accuracy on training set 68.39 %, accuracy on test set 75.06 %
>>> step 0700 : loss is 5920.29, accuracy on training set 83.73 %, accuracy on test set 87.76 %
>>> step 0800 : loss is 9137.66, accuracy on training set 79.72 %, accuracy on test set 83.33 %
>>> step 0900 : loss is 15949.15, accuracy on training set 69.33 %, accuracy on test set 77.05 %
>>> step 1000 : loss is 1758.80, accuracy on training set 92.45 %, accuracy on test set 91.79 %
在圖中,我們加載數(shù)據(jù),定義權(quán)重矩陣和模型,從分對(duì)數(shù)矢量中計(jì)算損失值,并將其傳遞給優(yōu)化器,該優(yōu)化器將更新迭代“num_steps”次數(shù)的權(quán)重。
在上述完全連接的NN中,我們使用了梯度下降優(yōu)化器來(lái)優(yōu)化權(quán)重。然而,有很多不同的優(yōu)化器可用于Tensorflow。 最常用的優(yōu)化器有GradientDescentOptimizer、AdamOptimizer和AdaGradOptimizer,所以如果你正在構(gòu)建一個(gè)CNN的話,我建議你試試這些。
Sebastian Ruder有一篇不錯(cuò)的博文介紹了不同優(yōu)化器之間的區(qū)別,通過(guò)這篇文章,你可以更詳細(xì)地了解它們。
2.4 Tensorflow的幾個(gè)方面
Tensorflow包含許多層,這意味著可以通過(guò)不同的抽象級(jí)別來(lái)完成相同的操作。這里有一個(gè)簡(jiǎn)單的例子,操作
logits = tf.matmul(tf_train_dataset, weights) + biases,
也可以這樣來(lái)實(shí)現(xiàn)
logits = tf.nn.xw_plus_b(train_dataset, weights, biases)。
這是layers API中最明顯的一層,它是一個(gè)具有高度抽象性的層,可以很容易地創(chuàng)建由許多不同層組成的神經(jīng)網(wǎng)絡(luò)。例如,或函數(shù)用于創(chuàng)建卷積和完全連接的層。通過(guò)這些函數(shù),可以將層數(shù)、過(guò)濾器的大小或深度、激活函數(shù)的類型等指定為參數(shù)。然后,權(quán)重矩陣和偏置矩陣會(huì)自動(dòng)創(chuàng)建,一起創(chuàng)建的還有激活函數(shù)和丟棄正則化層(dropout regularization laye)。
例如,通過(guò)使用 層API,下面這些代碼:
import Tensorflow as tf
w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth], stddev=0.1))
b1 = tf.Variable(tf.zeros([filter_depth]))
layer1_conv = tf.nn.conv2d(data, w1, [1, 1, 1, 1], padding='SAME')
layer1_relu = tf.nn.relu(layer1_conv + b1)
layer1_pool = tf.nn.max_pool(layer1_pool, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
可以替換為
from tflearn.layers.conv import conv_2d, max_pool_2d
layer1_conv = conv_2d(data, filter_depth, filter_size, activation='relu')
layer1_pool = max_pool_2d(layer1_conv_relu, 2, strides=2)
可以看到,我們不需要定義權(quán)重、偏差或激活函數(shù)。尤其是在你建立一個(gè)具有很多層的神經(jīng)網(wǎng)絡(luò)的時(shí)候,這樣可以保持代碼的清晰和整潔。
然而,如果你剛剛接觸Tensorflow的話,學(xué)習(xí)如何構(gòu)建不同種類的神經(jīng)網(wǎng)絡(luò)并不合適,因?yàn)閠flearn做了所有的工作。
因此,我們不會(huì)在本文中使用層API,但是一旦你完全理解了如何在Tensorflow中構(gòu)建神經(jīng)網(wǎng)絡(luò),我還是建議你使用它。
評(píng)論
查看更多