在线不卡福利,欧美猛烈性xbxbxbxb,亚洲综合网址

在進(jìn)行AI模型訓(xùn)練過程前，需要對數(shù)據(jù)集進(jìn)行處理, Tensorflow提供了tf.data數(shù)據(jù)集處理模塊，通過該接口能夠輕松實(shí)現(xiàn)數(shù)據(jù)集預(yù)處理。tf.data支持對數(shù)據(jù)集進(jìn)行大量處理，如圖片裁剪、圖片打亂、圖片分批次處理等操作。

數(shù)據(jù)集加載介紹

通過tf.data能夠?qū)崿F(xiàn)數(shù)據(jù)集加載，加載的數(shù)據(jù)格式包括:

●使用NumPy數(shù)組數(shù)據(jù)
●使用python生成器數(shù)據(jù)
●使用TFRecords格式數(shù)據(jù)
●使用文本格式數(shù)據(jù)
●使用CSV文件格式數(shù)據(jù)

1tf.data常見數(shù)據(jù)格式加載示例

●使用Numpy數(shù)組數(shù)據(jù)

通過numpy構(gòu)建數(shù)據(jù)，將構(gòu)建的數(shù)據(jù)傳遞到tf.data的Dataset中。

import tensorflow as tf
import numpy as np
# 通過numpy構(gòu)建數(shù)據(jù)個數(shù)
input_data = np.arange(4)
# 將數(shù)據(jù)傳遞到Dataset
dataset = tf.data.Dataset.from_tensor_slices(input_data)
for data in dataset:
    # 打印數(shù)據(jù)集，轉(zhuǎn)換數(shù)據(jù)集tensor格式
    print(data)

輸出為tensor數(shù)據(jù)集:

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)

●讀取文本中數(shù)據(jù)

通過準(zhǔn)備的文本文件file.txt，將文本文件中的內(nèi)容讀取到tf.data，文件內(nèi)容為:

Tf dataset load numpy data
Tf dataset load txt file data
Tf dateset load CSV file data

加載文本文件代碼:

import tensorflow as tf
# 通過TextLineDataset進(jìn)行加載文本文件內(nèi)容
dataset = tf.data.TextLineDataset("file.txt")
for line in dataset:
    print(line)

文本加載數(shù)據(jù)輸出(輸出的Tensor中已包含了文件文件中的數(shù)據(jù)):

tf.Tensor(b'Tf dataset load numpy data', shape=(), dtype=string)
tf.Tensor(b'Tf dataset load txt file data', shape=(), dtype=string)
tf.Tensor(b'Tf dateset load CSV file data', shape=(), dtype=string)

●讀取csv文本中數(shù)據(jù)

準(zhǔn)備csv文件file.csv，文件內(nèi)容為:

?

加載文本文件代碼:

  import tensorflow as tf
import pandas as pd
# 使用pandas讀取csv文本中數(shù)據(jù)
data = pd.read_csv('date.csv')
# 將讀取的data數(shù)據(jù)傳遞到dataset中
f_slices = tf.data.Dataset.from_tensor_slices(dict(data))
for d in f_slices:
    print (d)

csv文本加載數(shù)據(jù)輸出(輸出的Tensor中已包含了文件文件中的數(shù)據(jù)):

{'Year': , 'Month': , 'Day': , 'Hour': }
{'Year': , 'Month': , 'Day': , 'Hour': }
{'Year': , 'Month': , 'Day': , 'Hour': }

●利用python迭代構(gòu)建數(shù)據(jù)

通過python構(gòu)建迭代器方式，將數(shù)據(jù)傳遞到tf.data, 示例代碼如下:

# 迭代函數(shù)，通過傳遞的stop數(shù)據(jù)進(jìn)行迭代
def build_data(stop):
  i = 0
  while i
示例代碼輸出(迭代5次的Tensor):
tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)

	

	2tf.data常見數(shù)據(jù)處理

	tf.data常用以下操作對數(shù)據(jù)完成預(yù)處理過程，操作包括: repeat、batch、shuffle、map等。

	●tf.data數(shù)據(jù)repeat操作

	通過調(diào)用repeat操作，將原數(shù)據(jù)進(jìn)行重復(fù)構(gòu)建，重復(fù)構(gòu)建根據(jù)傳遞的repeat(x)次數(shù)決定。

	●tf.data數(shù)據(jù)batch操作

	通過調(diào)用batch操作將數(shù)據(jù)進(jìn)行分批次執(zhí)行，每批次數(shù)量根據(jù)batch(x)的值決定。

	●tf.data 數(shù)據(jù)shuffle操作，打亂數(shù)據(jù)順序

	shuffle操作常用于預(yù)處理數(shù)據(jù)集時，將數(shù)據(jù)集中的順序打亂，shuffle支持配置(buffer_size=x)將數(shù)據(jù)放置在緩沖區(qū)，通過緩沖區(qū)方式將數(shù)據(jù)打亂。

	●tf.data 數(shù)據(jù)map操作

	map操作能夠?qū)?shù)組中的元素重構(gòu)，同時能夠?qū)崿F(xiàn)讀取圖片，對圖片進(jìn)行旋轉(zhuǎn)操作。

	示例:
import tensorflow as tf
import numpy as np
# 使用numpy構(gòu)建12個數(shù)據(jù)
input_data = np.arange(12)
# 將構(gòu)建數(shù)據(jù)傳遞到dataset，傳遞中添加shuffle(10個緩沖區(qū)數(shù)據(jù)), batch分批次執(zhí)行(每次4個數(shù)據(jù)), repeat重復(fù)構(gòu)建數(shù)據(jù)2次
dataset = tf.data.Dataset.from_tensor_slices(input_data).shuffle(buffer_size=10).batch(4).repeat(2)
for data in dataset:
    print(data)
示例代碼輸出(輸出中可以看到Tensor每次4個數(shù)據(jù)，每個數(shù)據(jù)重復(fù)出現(xiàn)2次，每次數(shù)據(jù)亂序輸出):
tf.Tensor([8 3 9 1], shape=(4,), dtype=int64)
tf.Tensor([2 0 4 5], shape=(4,), dtype=int64)
tf.Tensor([ 7 11 10  6], shape=(4,), dtype=int64)
tf.Tensor([6 8 5 4], shape=(4,), dtype=int64)
tf.Tensor([ 7 10  2 11], shape=(4,), dtype=int64)
tf.Tensor([3 1 0 9], shape=(4,), dtype=int64)

	

	圖片旋轉(zhuǎn)示例，示例代碼如下:

	
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np


(train_data, train_label), (_, _) = tf.keras.datasets.mnist.load_data()
train_data = np.expand_dims(train_data.astype(np.float32) / 255.0, axis=-1)
mnist_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_label))
# 構(gòu)建旋轉(zhuǎn)函數(shù)，通過tensorflow的image.rot90完成90度旋轉(zhuǎn)
def rot90(image, label):
    image = tf.image.rot90(image)
    return image, label
# 通過map方式調(diào)用構(gòu)建的旋轉(zhuǎn)函數(shù)
mnist_dataset = mnist_dataset.map(rot90)
for image, label in mnist_dataset.take(1):
    #添加圖片抬頭標(biāo)簽
    plt.title(label.numpy())
    plt.imshow(image.numpy()[:, :])
    plt.show()

	

	正常加載圖片輸出:

	

	示例代碼運(yùn)行后，圖片旋轉(zhuǎn)輸出:

	

	

	mnist數(shù)據(jù)集預(yù)處理

	利用TensorFlow Datasets 提供了一系列可以和 TensorFlow 配合使用的數(shù)據(jù)集。下載和準(zhǔn)備數(shù)據(jù)，以及構(gòu)建tf.data.Dataset。

	示例代碼需要:

	
python3.6版本環(huán)境
安裝tensorflow==1.14.0版本（pip3 install tensorflow==2.1.0）
安裝tensorflow_datasets==4.4.0（pip3 install tensorflow-datasets==4.4.0）
示例代碼:
import tensorflow as tf
import tensorflow_datasets as tfds


#數(shù)據(jù)集通過Tensorflow Eager模式執(zhí)行
tf.compat.v1.enable_eager_execution()


# 加載 MNIST 訓(xùn)練數(shù)據(jù)。這個步驟會下載并準(zhǔn)備好該數(shù)據(jù)，除非你顯式指定 `download=False` ，值得注意的是，一旦該數(shù)據(jù)準(zhǔn)備好了，后續(xù)的  `load`  命令便不會重新下載，可以重復(fù)使用準(zhǔn)備好的數(shù)據(jù)。你可以通過指定  `data_dir=`  (默認(rèn)是  `~/tensorflow_datasets/` ) 來自定義數(shù)據(jù)保存/加載的路徑。
mnist_train = tfds.load(name="mnist", split="train")
assert isinstance(mnist_train, tf.data.Dataset)


mnist_builder = tfds.builder("mnist")
mnist_builder.download_and_prepare()
mnist_train = mnist_builder.as_dataset(split="train")
# 對數(shù)據(jù)集進(jìn)行重復(fù)使用，并對數(shù)據(jù)進(jìn)行打亂，分批次處理
mnist_train = mnist_train.repeat().shuffle(1024).batch(32)
# prefetch 將使輸入流水線可以在模型訓(xùn)練時異步獲取批處理
mnist_train = mnist_train.prefetch(tf.data.experimental.AUTOTUNE)
info = mnist_builder.info
print(info.features["label"].names)
mnist_test, info = tfds.load("mnist", split="test", with_info=True)
print(info)
# 通過tfds.show_examples可視化數(shù)據(jù)樣本
fig = tfds.show_examples(info, mnist_test)

	

	代碼示例輸出:

	
# 數(shù)據(jù)集label名稱
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# 數(shù)據(jù)集信息
tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='/home/fabian/tensorflow_datasets/mnist/3.0.1',
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': ,
        'train': ,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)
可視化樣本數(shù)據(jù)圖片:



	作者介紹：陳遠(yuǎn)斌，本科畢業(yè)于南開大學(xué)，海云捷迅研發(fā)工程師，熟悉OpenStack，Kubernetes技術(shù)，曾參與社區(qū)代碼貢獻(xiàn),在OpenStack云計(jì)算技術(shù)上有一定的開發(fā)經(jīng)驗(yàn)。

	審核編輯：湯梓紅

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴