在线播放国产精品一品道综合,日产精品99久久久久久

Mara-pipelines 是一個(gè)輕量級的數(shù)據(jù)轉(zhuǎn)換框架，具有透明和低復(fù)雜性的特點(diǎn)。其他特點(diǎn)如下：

基于非常簡單的Python代碼就能完成流水線開發(fā)。

使用 PostgreSQL 作為數(shù)據(jù)處理引擎。

有Web界面可視化分析流水線執(zhí)行過程。

基于 Python 的 multiprocessing 單機(jī)流水線執(zhí)行。不需要分布式任務(wù)隊(duì)列。輕松調(diào)試和輸出日志。

基于成本的優(yōu)先隊(duì)列：首先運(yùn)行具有較高成本（基于記錄的運(yùn)行時(shí)間）的節(jié)點(diǎn)。

此外，在Mara-pipelines的Web界面中，你不僅可以查看和管理流水線及其任務(wù)節(jié)點(diǎn)，你還可以直接觸發(fā)這些流水線和節(jié)點(diǎn)，非常好用：

1.安裝

由于使用了大量的依賴，Mara-pipelines 并不適用于 Windows，如果你需要在 Windows 上使用 Mara-pipelines，請使用 Docker 或者 Windows 下的 linux 子系統(tǒng)。

使用pip安裝Mara-pipelines:

pip install mara-pipelines

或者：

pip install git+https://github.com/mara/mara-pipelines.git

2.使用示例

這是一個(gè)基礎(chǔ)的流水線演示，由三個(gè)相互依賴的節(jié)點(diǎn)組成，包括任務(wù)1(ping_localhost), 子流水線(sub_pipeline), 任務(wù)2(sleep):

# 注意，這個(gè)示例中使用了部分國外的網(wǎng)站，如果無法訪問，請變更為國內(nèi)網(wǎng)站。
frommara_pipelines.commands.bash importRunBash
frommara_pipelines.pipelines importPipeline, Task
frommara_pipelines.ui.cli importrun_pipeline, run_interactively

pipeline = Pipeline(
id='demo',
description='A small pipeline that demonstrates the interplay between pipelines, tasks and commands')

pipeline.add(Task(id='ping_localhost', description='Pings localhost',
commands=[RunBash('ping -c 3 localhost')]))

sub_pipeline = Pipeline(id='sub_pipeline', description='Pings a number of hosts')

forhost in['google', 'amazon', 'facebook']:
sub_pipeline.add(Task(id=f'ping_{host}', description=f'Pings {host}',
commands=[RunBash(f'ping -c 3 {host}.com')]))

sub_pipeline.add_dependency('ping_amazon', 'ping_facebook')
sub_pipeline.add(Task(id='ping_foo', description='Pings foo',
commands=[RunBash('ping foo')]), ['ping_amazon'])

pipeline.add(sub_pipeline, ['ping_localhost'])

pipeline.add(Task(id='sleep', description='Sleeps for 2 seconds',
commands=[RunBash('sleep 2')]), ['sub_pipeline'])

可以看到，Task包含了多個(gè)commands，這些 command s會用于真正地執(zhí)行動(dòng)作。

而 pipeline.add 的參數(shù)中，第一個(gè)參數(shù)是其節(jié)點(diǎn)，第二個(gè)參數(shù)是此節(jié)點(diǎn)的上游。如：

pipeline.add(sub_pipeline, ['ping_localhost'])

則表明必須執(zhí)行完 ping_localhost 才會執(zhí)行 sub_pipeline.

為了運(yùn)行這個(gè)流水線，需要配置一個(gè) PostgreSQL 數(shù)據(jù)庫來存儲運(yùn)行時(shí)信息、運(yùn)行輸出和增量處理狀態(tài)：

importmara_db.auto_migration
importmara_db.config
importmara_db.dbs

mara_db.config.databases 
= lambda: {'mara': mara_db.dbs.PostgreSQLDB(host='localhost', user='root', database='example_etl_mara')}

mara_db.auto_migration.auto_discover_models_and_migrate()

如果 PostgresSQL 正在運(yùn)行并且賬號密碼正確，輸出如下所示（創(chuàng)建了一個(gè)包含多個(gè)表的數(shù)據(jù)庫）：

Created database "postgresql+psycopg2://root@localhost/example_etl_mara"

CREATETABLEdata_integration_file_dependency (
node_path TEXT[] NOTNULL, 
dependency_type VARCHARNOTNULL, 
hashVARCHAR, 
timestampTIMESTAMPWITHOUTTIMEZONE, 
PRIMARY KEY(node_path, dependency_type)
);

.. more tables

為了運(yùn)行這個(gè)流水線，你需要：

frommara_pipelines.ui.cli importrun_pipeline

run_pipeline(pipeline)

這將運(yùn)行單個(gè)流水線節(jié)點(diǎn)及其 (sub_pipeline) 所依賴的所有節(jié)點(diǎn)：

run_pipeline(sub_pipeline, nodes=[sub_pipeline.nodes['ping_amazon']], with_upstreams=True)

3.Web 界面

我認(rèn)為 mara-pipelines 最有用的是他們提供了基于Flask管控流水線的Web界面。

對于每條流水線，他們都有一個(gè)頁面顯示：

所有子節(jié)點(diǎn)的圖以及它們之間的依賴關(guān)系

流水線的總體運(yùn)行時(shí)間圖表以及過去 30 天內(nèi)最昂貴的節(jié)點(diǎn)（可配置）

所有流水線節(jié)點(diǎn)及其平均運(yùn)行時(shí)間和由此產(chǎn)生的排隊(duì)優(yōu)先級的表

流水線最后一次運(yùn)行的輸出和時(shí)間線

對于每個(gè)任務(wù)，都有一個(gè)頁面顯示

流水線中任務(wù)的上游和下游

最近 30 天內(nèi)任務(wù)的運(yùn)行時(shí)間

任務(wù)的所有命令

任務(wù)最后運(yùn)行的輸出

此外，流水線和任務(wù)可以直接從網(wǎng)頁端調(diào)用運(yùn)行，這是非常棒的特點(diǎn)。

責(zé)任編輯：haq

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報(bào)投訴