電子發(fā)燒友網(wǎng)>電子資料下載>電子資料>TinyML：使用ChatGPT和合成數(shù)據(jù)檢測嬰兒哭聲

TinyML：使用ChatGPT和合成數(shù)據(jù)檢測嬰兒哭聲

2388746 2023-07-13 | zip | 0.00 MB | 次下載 | 2積分

資料介紹

描述

TinyML 是機(jī)器學(xué)習(xí)的一個(gè)領(lǐng)域，專注于將人工智能的力量帶給低功耗設(shè)備。該技術(shù)對于需要實(shí)時(shí)處理的應(yīng)用程序特別有用。在機(jī)器學(xué)習(xí)領(lǐng)域，目前在定位和收集數(shù)據(jù)集方面存在挑戰(zhàn)。然而，使用合成數(shù)據(jù)可以以一種既具有成本效益又具有適應(yīng)性的方式訓(xùn)練 ML 模型，從而消除了對大量真實(shí)世界數(shù)據(jù)的需求。

在此項(xiàng)目中，我將向您展示如何通過使用Edge Impulse平臺(tái)訓(xùn)練模型來創(chuàng)建嬰兒哭聲檢測系統(tǒng)，并將其部署到您的邊緣設(shè)備（例如Arduino Nicla Voice）。通過使用合成數(shù)據(jù)訓(xùn)練機(jī)器學(xué)習(xí)模型，我們可以區(qū)分嬰兒哭聲的發(fā)生或背景噪音的存在。

這是即將發(fā)生的事情的先睹為快：

將數(shù)據(jù)集收集到Edge Impulse，以使用AduioLDM：Text-to-Audio和ChatGPT技術(shù)訓(xùn)練模型。
使用Edge Impulse訓(xùn)練模型。
導(dǎo)出模型供netron.app分析
將您的模型部署到Arduino Nicla Voice。
使用 Arduino IDE 進(jìn)行實(shí)時(shí)數(shù)據(jù)評估和測試。

Baby Cry 系統(tǒng)部署管道

該圖包含部署機(jī)器學(xué)習(xí)模型以檢測兩種情況所涉及的幾個(gè)組件和步驟：嬰兒哭聲和背景噪音，使用 ChatGPT 生成文本提示。

以下是管道圖中組件及其交互的逐步分解：

ChatGPT ：ChatGPT 是管道的起點(diǎn)。它為兩種情況生成文本提示：嬰兒哭聲和背景噪音。
文本到音頻轉(zhuǎn)換：生成文本提示后，我們將它們發(fā)送到將文本轉(zhuǎn)換為音頻的模塊。該模塊創(chuàng)建與兩種情況的提示相對應(yīng)的音頻文件。
模型訓(xùn)練：生成的音頻文件上傳到Edge Impulse SaaS平臺(tái)。這是一個(gè)基于云的平臺(tái)，提供用于為微控制器等邊緣設(shè)備開發(fā)、訓(xùn)練和部署機(jī)器學(xué)習(xí)模型的工具。
模型部署：訓(xùn)練完成后，將機(jī)器學(xué)習(xí)模型部署到Arduino Nicla Voice開發(fā)板上。這些開發(fā)板專為構(gòu)建可處理音頻和執(zhí)行機(jī)器學(xué)習(xí)任務(wù)的智能語音設(shè)備而設(shè)計(jì)。
推論：部署后，機(jī)器學(xué)習(xí)模型可以處理來自麥克風(fēng)的實(shí)時(shí)音頻輸入。該模型可以檢測輸入音頻是否代表嬰兒哭聲或背景噪音。

潛在地，機(jī)器學(xué)習(xí)模型的輸出可用于觸發(fā)動(dòng)作，例如打開燈或向智能手機(jī)發(fā)送通知。

Arduino Nicla語音開發(fā)板概述

Arduino Nicla Voice是與Syntiant合作創(chuàng)建的開發(fā)板。通過使用 Syntiant 的超低功耗深度學(xué)習(xí)處理器，該板能夠在邊緣提供永遠(yuǎn)在線的語音、手勢和動(dòng)作識(shí)別。

1 / 2 ? Arduino Nicla 語音開發(fā)板

憑借其緊湊的尺寸，Nicla Voice 可以集成到可穿戴設(shè)備中，允許 AI 集成，同時(shí)需要最少的能量消耗。通過使用 Nicla Voice，您可以開發(fā)定制的語音識(shí)別模型并將它們與開發(fā)板一起使用，從而使 Nicla Voice 能夠通過分析您的聲音來識(shí)別特定的單詞或短語。

讓我們開始吧！

使用 ChatGPT 生成文本提示

使用ChatGPT生成不同的提示可以簡化為我的機(jī)器學(xué)習(xí)模型編寫提示的過程，該模型由兩類組成：嬰兒哭聲和背景噪音。通過使用ChatGPT生成不同的提示，我可以節(jié)省時(shí)間和精力，否則這些時(shí)間和精力將花費(fèi)在集思廣益和編寫提示上。這種方法還可以產(chǎn)生范圍更廣的多樣化提示，從而可以提高機(jī)器學(xué)習(xí)模型的準(zhǔn)確性和有效性。

這是使用 ChatGPT 生成的 Baby crying 場景的我的文本提示。

prompts = [
"Baby Crying",
"Baby crying in bedroom",
"Baby crying loudly",
"Infant crying",
"Newborn crying",
"Crying baby",
"Upset baby",
"Distressed baby",
"Fussy baby",
"Weeping infant",
"Sobbing baby",
"Whimpering baby",
"Wailing baby",
"Bawling baby",
"Crying newborn",
"Tearful baby",
"Bawling infant",
"Mourning baby",
"Bellowing baby",
"Screaming baby",
"Howling baby",
"Squalling baby",
"Yowling baby",
"Crying baby in nursery",
"Wailing infant in bedroom",
"Whimpering baby in crib",
"Sobbing baby in bassinet",
"Crying baby in the dark",
"Upset baby in bed",
"Distressed baby in room",
"Fussy baby in cradle",
"Weeping infant in playpen",
"Sobbing baby in the corner",
"Whimpering baby in the closet",
"Wailing baby in the crib",
"Bawling baby in the nursery",
"Crying newborn in the bedroom",
"Tearful baby in the playroom",
"Bawling infant in the den",
"Mourning baby in the living room",
"Bellowing baby in the kitchen",
"Screaming baby in the bathroom",
"Howling baby in the hallway",
"Squalling baby in the dining room",
"Yowling baby in the family room",
"Crying baby in the middle of the night",
"Wailing infant in the early morning",
"Whimpering baby during naptime",
"Sobbing baby during mealtime",
"Crying baby during bathtime",
"Upset baby during diaper change",
"Distressed baby during playtime",
"Fussy baby during bedtime",
"Weeping infant during storytime",
"Sobbing baby during teething",
"Whimpering baby during vaccination",
"Wailing baby during check-up",
"Bawling baby during colic",
"Crying newborn during feeding",
"Tearful baby during immunization",
"Bawling infant during growth spurt",
"Mourning baby during illness",
"Bellowing baby during teething",
"Screaming baby during reflux",
"Howling baby during ear infection",
"Squalling baby during constipation",
"Yowling baby during sleep regression",
"Crying baby during travel",
"Wailing infant during car ride",
"Whimpering baby during flight",
"Sobbing baby during road trip",
"Crying baby during vacation",
"Upset baby during change of environment",
"Distressed baby during new experiences",
"Fussy baby during unfamiliar situations",
"Weeping infant during loud noises",
"Sobbing baby during separation anxiety",
"Whimpering baby during stranger danger",
"Wailing baby during socialization",
"Bawling baby during weaning",
"Crying newborn during swaddling",
"Tearful baby during bath",
"Bawling infant during burping",
"Mourning baby during pacifier weaning",
"Bellowing baby during crawling",
"Screaming baby during walking",
]

此外，使用像 ChatGPT 這樣的語言模型可以幫助我提出我可能想不到的有創(chuàng)意和創(chuàng)新的提示。

這些是背景噪音提示。

prompts = [
"A hammer is hitting a wooden surface",
"A noise of nature",
"The sound of waves crashing on the shore",
"A thunderstorm in the distance",
"Traffic noise on a busy street",
"The hum of an air conditioning unit",
"Birds chirping in the morning",
"The sound of a train passing by",
"A group of people talking in a crowded room",
"The sound of raindrops hitting a tin roof",
"The buzz of a fluorescent light",
"The sound of footsteps on a wooden floor",
"The crackling of a campfire",
"The whirring of a ceiling fan",
"The sound of a basketball bouncing on concrete",
"A dog barking in the distance",
"The rustling of leaves in the wind",
"The buzzing of a bee or other insect",
"The sound of a church bell ringing",
"The roar of a waterfall",
"The tapping of a keyboard",
"The hiss of a steam engine",
"The clanging of pots and pans in a kitchen",
"The sound of a roaring fire in a fireplace",
"The hum of an electric generator",
"The sound of a lawnmower in the distance",
"The whistling of wind through a window crack",
"The clatter of dishes in a busy restaurant",
"The sound of a helicopter flying overhead",
"The tapping of rain on a metal roof",
"The gentle rustling of a book's pages turning",
"The creaking of a wooden chair",
"The sound of a pencil scratching on paper",
"The chirping of crickets at night",
"The crackling of a vinyl record playing",
"The hissing of an old radio",
"The sound of a pencil sharpener grinding",
"The gurgling of a coffee maker",
"The sound of a ticking clock",
"The roar of an airplane engine",
"The bubbling of a fish tank filter",
"The clanking of dishes being washed in a sink",
"The sound of a typewriter clacking",
"The roar of a lion in the wild",
"The whirring of a drone flying overhead",
"The beeping of a car horn in traffic",
"The sound of a door creaking open",
"The buzzing of a mosquito in the room",
"The sound of a blender mixing ingredients",
"The rumbling of a thunderstorm overhead",
"The tapping of a woodpecker on a tree trunk",
"The rustling of paper being shuffled",
"The sound of a busy office with people talking on the phone and typing on their keyboards",
"The sound of a construction site with heavy machinery and drilling",
"The sound of a dishwasher running in the kitchen",
"The chirping of birds in a forest",
"The sound of a police siren in the distance",
"The whistling of wind through tall grass",
"The sound of a cash register in a busy store",
"The buzzing of a fly or bee flying around",
"The sound of a bicycle bell ringing",
"The crackling of a fire in a fireplace"
]

這就是數(shù)據(jù)集生成的全部內(nèi)容！

安裝 AudioLDM:Text-to-Audio 用于數(shù)據(jù)集生成

要從文本生成音頻文件，下一步涉及使用名為AudioLDM的文本到音頻生成工具，該工具由薩里大學(xué)和英國倫敦帝國理工學(xué)院的研究人員開發(fā)。該工具利用潛在擴(kuò)散模型從文本生成高質(zhì)量音頻。要使用 AudioLDM，您需要一臺(tái)配備強(qiáng)大 CPU 的獨(dú)立計(jì)算機(jī)。雖然建議使用專用 GPU，但這不是強(qiáng)制性的。要測試 AudioLDM 的功能，您可以通過Hugging Face在線試用。

我們將配置我們的 Python 環(huán)境。為了管理虛擬環(huán)境，我們將使用virtualenv ，它可以像下面這樣安裝：

sudo pip3 install virtualenv virtualenvwrapper

為了讓 virtualenv 工作，我們需要將以下行添加到~/.bashrc文件中：

nano ~/.bashrc

并添加以下行

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

要激活更改，必須執(zhí)行以下命令：

source ~/.bashrc

現(xiàn)在我們可以使用 mkvirtualenv 命令創(chuàng)建一個(gè)虛擬環(huán)境。

mkvirtualenv audioldm -p python

使用 pip 安裝 PyTorch。

pip3 install torch==2.0.0

然后安裝audioldm包。

pip3 install audioldm

然后運(yùn)行以下命令以使用文本提示生成音頻文件，該文件是使用 ChatGPT 生成的，可以在下面的 github 代碼部分中找到。

python3 generate.py

您應(yīng)該得到以下輸出：

genereated: A hammer is hitting a wooden surface
genereated: A noise of nature
genereated: The sound of waves crashing on the shore
genereated: A thunderstorm in the distance
genereated: Traffic noise on a busy street
genereated: The hum of an air conditioning unit
genereated: Birds chirping in the morning
genereated: The sound of a train passing by

一旦收集到 wav 音頻樣本，就可以將它們輸入神經(jīng)網(wǎng)絡(luò)以啟動(dòng)自動(dòng)檢測嬰兒是否在哭泣或是否存在背景噪音的訓(xùn)練過程。

使用 Edge Impulse 平臺(tái)進(jìn)行模型訓(xùn)練

Edge Impulse 是一種基于 Web 的工具，可幫助我們快速輕松地創(chuàng)建可用于各種項(xiàng)目的 AI 模型。我們可以通過幾個(gè)簡單的步驟創(chuàng)建機(jī)器學(xué)習(xí)模型，用戶只需一個(gè)網(wǎng)絡(luò)瀏覽器就可以構(gòu)建自定義圖像分類器。

轉(zhuǎn)到Arduino 云平臺(tái)，在登錄處輸入您的憑據(jù)（或創(chuàng)建一個(gè)帳戶），然后開始一個(gè)新項(xiàng)目。

下載Google Speech Commands Dataset以從中獲取“背景噪聲類”數(shù)據(jù)。可以按如下方式下載數(shù)據(jù)集。

wget http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz

從Google Speech Commands Dataset上傳合成 wav 音頻文件和“背景噪音類” 。就我而言，我上傳了大約 500 個(gè) wav 文件。如果需要，您還可以通過標(biāo)記文件并在數(shù)據(jù)采集中上傳并重新訓(xùn)練模型來添加更多文件。

一旦你設(shè)置了所有的類并且對你的數(shù)據(jù)集感到滿意，就可以訓(xùn)練模型了。在左側(cè)導(dǎo)航菜單中導(dǎo)航至 Create Impulse。

選擇Add a processing block并添加Audio (Syntiant) ，因?yàn)樗浅＿m合基于 Syntiant NDP120 的開發(fā)板。它會(huì)嘗試將音頻轉(zhuǎn)換成某種基于時(shí)間和頻率特征的特征，這將有助于我們進(jìn)行分類。然后選擇添加學(xué)習(xí)塊并添加具有兩個(gè)輸出類的分類。

然后導(dǎo)航到 Syntiant。在 Syntiant 下，我們將保留默認(rèn)參數(shù)。單擊保存參數(shù)。

最后，單擊生成功能按鈕。您應(yīng)該會(huì)得到如下所示的響應(yīng)。

按“開始訓(xùn)練”按鈕訓(xùn)練模型。此過程可能需要大約 5-10 分鐘，具體取決于您的數(shù)據(jù)集大小。如果一切正常，您應(yīng)該會(huì)在 Edge Impulse 中看到以下內(nèi)容

我們得到了 90.7% 的驗(yàn)證準(zhǔn)確率。你不應(yīng)該從你的訓(xùn)練數(shù)據(jù)集中獲得 100% 的準(zhǔn)確率，因?yàn)樗梢员徽J(rèn)為是過度擬合的模型。任何大于 70% 的值都是出色的模型性能。增加訓(xùn)練時(shí)期的數(shù)量可能會(huì)增加這個(gè)準(zhǔn)確度分?jǐn)?shù)。

.tflite文件是我們的模型。最終的量化模型文件 (int8) 大小約為5KB ，準(zhǔn)確??率接近 90%。

查看模型架構(gòu)及其輸入和輸出格式和形狀總是很有趣。您可以使用像Netron這樣的程序來查看神經(jīng)網(wǎng)絡(luò)。

單擊 serving_default_x:0：我們觀察到輸入的類型為 int8，大小為 [1, 1600]。現(xiàn)在讓我們看看輸出：我們有 2 個(gè)類，所以我們看到輸出形狀是 [1, 2]。量化過程會(huì)降低模型的性能，因?yàn)閺?32 位浮點(diǎn)到 8 位整數(shù)表示意味著精度損失。

完成模型構(gòu)建后，請轉(zhuǎn)到“部署”部分并將其部署到其中一個(gè)受支持的邊緣設(shè)備上。ML 模型部署是將經(jīng)過訓(xùn)練和測試的 ML 模型放入邊緣設(shè)備等生產(chǎn)環(huán)境中的過程，在這里它可以用于其預(yù)期目的。

轉(zhuǎn)到 Edge Impulse 的“部署”選項(xiàng)卡。單擊您的邊緣設(shè)備固件類型。在這里，它是 Arduino Nicla 語音。

您可能會(huì)看到以下日志消息：

Total Parameter Memory: 1.375 KB out of 640.0 KB on the NDP120_B0 device.                            | | Estimated Model Energy/Inference at 0.9V: 5.55404 (uJ)

此信息很重要，因?yàn)樗砻髂Ｐ偷膬?nèi)存效率以及它是否可以部署在 Arduino Nicla Voice 等資源有限的設(shè)備上。

我已經(jīng)獲取了訓(xùn)練數(shù)據(jù)并使用 Edge Impulse 平臺(tái)在云中訓(xùn)練了一個(gè)模型，現(xiàn)在我們正在 Arduino Nicla Voice 上本地運(yùn)行該模型。因此，可以說它已成功部署到邊緣設(shè)備。潛在地，可以通過添加觸發(fā)操作來改進(jìn)該項(xiàng)目，例如打開燈或向智能手機(jī)發(fā)送通知。

總之，通過利用 TinyML 的功能并利用通過文本到音頻和 ChatGPT 生成的合成數(shù)據(jù)，可以提高檢測和響應(yīng)嬰兒哭聲的效率和準(zhǔn)確性。證明了人工數(shù)據(jù)生成的有效性，從而消除了手動(dòng)數(shù)據(jù)集搜索的需要。

請隨時(shí)在下面發(fā)表評論。感謝您的閱讀！