1、問題概述
雖然軟件底層模塊在網(wǎng)絡(luò)恢復(fù)后能自動(dòng)重連上服務(wù)器,但會(huì)議因?yàn)榫W(wǎng)絡(luò)問題已經(jīng)退出,需要重新加入會(huì)議。因?yàn)榭蛻籼厥獾木W(wǎng)絡(luò)運(yùn)行環(huán)境,會(huì)頻繁出現(xiàn)網(wǎng)絡(luò)抖動(dòng)不穩(wěn)定的情況,客戶要求必須要實(shí)現(xiàn)60秒內(nèi)網(wǎng)絡(luò)恢復(fù)后能依然保持在會(huì)議中,保證會(huì)議流程不被中斷。
客戶堅(jiān)持要實(shí)現(xiàn)這個(gè)特殊的功能點(diǎn),項(xiàng)目已經(jīng)接近尾聲,目前處于客戶試用階段,不實(shí)現(xiàn)該功能,項(xiàng)目無法通過驗(yàn)收,客戶不給錢。
前方同事將當(dāng)前問題及項(xiàng)目進(jìn)展情況向研發(fā)部門領(lǐng)導(dǎo)反饋,研發(fā)部緊急召開討論會(huì)議,商討60秒不掉會(huì)的實(shí)現(xiàn)方案。這里面涉及到兩大類的網(wǎng)絡(luò)連接,一類是傳輸控制信令的TCP連接,另一類是傳輸音視頻碼流的UDP連接。UDP連接的問題不大,主要是TCP連接的斷鏈與重連問題,下面主要討論TCP連接相關(guān)問題。
在出現(xiàn)網(wǎng)絡(luò)不穩(wěn)定掉會(huì)時(shí),可能是系統(tǒng)TCPIP協(xié)議棧已經(jīng)檢測(cè)到網(wǎng)絡(luò)異常,系統(tǒng)協(xié)議層已經(jīng)將網(wǎng)絡(luò)斷開了;也可能軟件應(yīng)用層的心跳機(jī)制檢測(cè)到網(wǎng)絡(luò)故障,斷開了與服務(wù)器的鏈接。對(duì)于系統(tǒng)TCPIP協(xié)議棧自身檢測(cè)出來的網(wǎng)絡(luò)異常,則可能存在兩種情況,一是TCPIP協(xié)議棧自身的心跳機(jī)制檢測(cè)出來的;二是TCP連接的丟包重傳機(jī)制檢測(cè)出異常。
對(duì)于應(yīng)用層的心跳檢測(cè)機(jī)制,我們可以放大超時(shí)檢測(cè)時(shí)間。本文我們主要討論一下TCPIP協(xié)議棧的TCP連接的心跳、丟包重傳、連接超時(shí)等機(jī)制。在檢測(cè)到網(wǎng)絡(luò)異常后,我們底層可以自動(dòng)發(fā)起重連或者信令發(fā)送觸發(fā)自動(dòng)重連,業(yè)務(wù)模塊將會(huì)議相關(guān)資源保存不釋放,在網(wǎng)絡(luò)恢復(fù)后可以繼續(xù)保持在會(huì)議中,可以繼續(xù)接收到會(huì)議中的音視頻碼流,可以繼續(xù)進(jìn)行會(huì)議中的一些操作!
2、TCPIP協(xié)議棧的心跳機(jī)制
2.1、TCP中的ACK機(jī)制
TCP建鏈時(shí)的三次握手流程如下所示:
之所以說TCP連接是可靠的,首先是發(fā)送數(shù)據(jù)前要建立連接,再就是收到數(shù)據(jù)后都會(huì)給對(duì)方恢復(fù)一個(gè)ACK包,表明我收到你的數(shù)據(jù)包了。對(duì)于數(shù)據(jù)發(fā)送端,如果數(shù)據(jù)發(fā)出去后沒有收到ACK包,則會(huì)觸發(fā)丟包重傳機(jī)制。
不管是建鏈時(shí),還是建鏈后的數(shù)據(jù)收發(fā)時(shí),都有ACK包,TCPIP協(xié)議棧的心跳包也不例外。
2.2、TCPIP協(xié)議棧的心跳機(jī)制說明
TCPIP協(xié)議棧有個(gè)默認(rèn)的TCP心跳機(jī)制,這個(gè)心跳機(jī)制是和socket套接字(TCP套接字)綁定的,可以對(duì)指定的套接字開啟協(xié)議棧的心跳檢測(cè)機(jī)制。默認(rèn)情況下,協(xié)議棧的心跳機(jī)制對(duì)socket套接字是關(guān)閉的,如果要使用需要人為開啟的。
在Windows中,默認(rèn)是每隔2個(gè)小時(shí)發(fā)一次心跳包,客戶端程序?qū)⑿奶l(fā)給服務(wù)器后,接下來會(huì)有兩種情況:
1)網(wǎng)絡(luò)正常時(shí):服務(wù)器收到心跳包,會(huì)立即回復(fù)ACK包,客戶端收到ACK包后,再等2個(gè)小時(shí)發(fā)送下一個(gè)心跳包。其中,心跳包發(fā)送時(shí)間間隔時(shí)間keepalivetime,Windows系統(tǒng)中默認(rèn)是2小時(shí),可配置。如果在2個(gè)小時(shí)的時(shí)間間隔內(nèi),客戶端和服務(wù)器有數(shù)據(jù)交互,客戶端會(huì)收到服務(wù)器的ACK包,也算作心跳機(jī)制的心跳包,2個(gè)小時(shí)的時(shí)間間隔會(huì)重新計(jì)時(shí)。2)網(wǎng)絡(luò)異常時(shí):服務(wù)器收不到客戶端發(fā)過去的心跳包,沒法回復(fù)ACK,Windows系統(tǒng)中默認(rèn)的是1秒超時(shí),1秒后會(huì)重發(fā)心跳包。如果還收不到心跳包的ACK,則1秒后重發(fā)心跳包,如果始終收不到心跳包,則在發(fā)出10個(gè)心跳包就達(dá)到了系統(tǒng)的上限,就認(rèn)為網(wǎng)絡(luò)出故障了,協(xié)議棧就會(huì)直接將連接斷開了。其中,發(fā)出心跳包收不到ACK的超時(shí)時(shí)間稱為keepaliveinterval,Windows系統(tǒng)中默認(rèn)是1秒,可配置;收不到心跳包對(duì)應(yīng)的ACK包的重發(fā)次數(shù)probe,Windows系統(tǒng)是固定的,是固定的10次,不可配置的。
所以TCPIP協(xié)議棧的心跳機(jī)制也能檢測(cè)出網(wǎng)絡(luò)異常,不過在默認(rèn)配置下可能需要很久才能檢測(cè)出來,除非網(wǎng)絡(luò)異常出現(xiàn)在正在發(fā)送心跳包后等待對(duì)端的回應(yīng)時(shí),這種情況下如果多次重發(fā)心跳包都收不到ACK回應(yīng),協(xié)議棧就會(huì)判斷網(wǎng)絡(luò)出故障,主動(dòng)將連接關(guān)閉掉。
2.3、修改TCPIP協(xié)議棧的默認(rèn)心跳參數(shù)
TCPIP協(xié)議棧的默認(rèn)心跳機(jī)制的開啟,不是給系統(tǒng)整個(gè)協(xié)議棧開啟心跳監(jiān)測(cè),而是對(duì)某個(gè)socket套接字開啟。
開啟心跳機(jī)制后,還可以修改心跳的時(shí)間參數(shù)。從代碼上看,先調(diào)用setsockopt給目標(biāo)套接字開啟心跳監(jiān)測(cè)機(jī)制,再調(diào)用WSAIoctl去修改心跳檢測(cè)的默認(rèn)時(shí)間參數(shù),相關(guān)代碼如下所示:
SOCKET socket;
// ......(中間代碼省略)
int optval = 1;
int nRet = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (const char *)&optval,
sizeof(optval));
if (nRet != 0)
return;
tcp_keepalive alive;
alive.onoff = TRUE;
alive.keepalivetime = 101000;
alive.keepaliveinterval = 21000;
DWORD dwBytesRet = 0;
nRet = WSAIoctl(socket, SIO_KEEPALIVE_VALS, &alive, sizeof(alive), NULL, 0,
&dwBytesRet, NULL, NULL);
if (nRet != 0)
return;
上面的代碼可以看到,先調(diào)用setsockopt函數(shù),傳入SO_KEEPALIVE參數(shù),打開TCP連接的心跳開關(guān),此時(shí)心跳參數(shù)使用系統(tǒng)默認(rèn)的心跳參數(shù)值。緊接著,調(diào)用WSAIoCtrl函數(shù),傳入SIO_KEEPALIVE_VALS參數(shù),同時(shí)將設(shè)置好時(shí)間值的心跳參數(shù)結(jié)構(gòu)體傳進(jìn)去。
下面對(duì)心跳參數(shù)結(jié)構(gòu)體tcp_keepalive做個(gè)詳細(xì)的說明:(以Windows系統(tǒng)為例)
1)keepalivetime:默認(rèn)2小時(shí)發(fā)送一次心跳保活包,比如發(fā)送第1個(gè)保活包之后,間隔2個(gè)小時(shí)后再發(fā)起下一個(gè)?;畎?。如果這期間有數(shù)據(jù)交互,也算是有效的保活包,這個(gè)時(shí)間段就不再發(fā)送?;畎?,發(fā)送下個(gè)?;畎臅r(shí)間間隔會(huì)從收發(fā)的最后一條數(shù)據(jù)的時(shí)刻開始重新從0計(jì)時(shí)。2)keepaliveinterval:發(fā)送?;畎?,沒有收到對(duì)端的ack的超時(shí)時(shí)間默認(rèn)為1秒。假設(shè)和對(duì)端的網(wǎng)絡(luò)出問題了,給對(duì)端發(fā)送第1個(gè)?;畎?,1秒內(nèi)沒有收到對(duì)端的ack,則發(fā)第2個(gè)保活包,1秒內(nèi)沒有收到對(duì)端的保活包,再發(fā)送下一個(gè)?;畎?,.....,直到發(fā)送第10個(gè)保活包后,1秒鐘還沒收到ack回應(yīng),則達(dá)到發(fā)送10次保活包的探測(cè)次數(shù)上限,則認(rèn)為網(wǎng)絡(luò)出問題了。3)probe探測(cè)次數(shù):Windows系統(tǒng)上的探測(cè)次數(shù)被固定為10次,不可修改。
MSDN上對(duì)心跳機(jī)制檢測(cè)出的網(wǎng)絡(luò)異常的說明如下:
If a connection is dropped as the result of keep-alives the error code WSAENETRESET is returned to any calls in progress on the socket, and any subsequent calls will fail with WSAENOTCONN.
因?yàn)楸;畲螖?shù)達(dá)到上限導(dǎo)致連接被丟棄掉,所有正在調(diào)用中的套接字接口會(huì)返回WSAENETRESET錯(cuò)誤碼,后續(xù)的套接字api函數(shù)的調(diào)用都會(huì)返回WSAENOTCONN。
3、libwebsockets開源庫中的心跳機(jī)制使用的就是TCPIP協(xié)議棧的心跳機(jī)制
我們的產(chǎn)品之前在使用websocket時(shí),就遇到?jīng)]有設(shè)置心跳機(jī)制導(dǎo)致TCP長(zhǎng)連接被網(wǎng)絡(luò)設(shè)備無故釋放的問題。
我們客戶端程序在登錄時(shí),會(huì)去連接某業(yè)務(wù)的注冊(cè)服務(wù)器,建立的是websocket長(zhǎng)連接。這個(gè)長(zhǎng)連接一直保持著,只有使用該業(yè)務(wù)模塊的業(yè)務(wù)時(shí)才會(huì)使用到該連接,在該連接上進(jìn)行數(shù)據(jù)交互。軟件登錄后,如果一直沒有操作該業(yè)務(wù)模塊的業(yè)務(wù),這個(gè)長(zhǎng)連接會(huì)一直處于閑置狀態(tài),即這個(gè)連接上沒有數(shù)據(jù)交互。
結(jié)果在某次測(cè)試過程中出現(xiàn)了問題,排查下來發(fā)現(xiàn),這個(gè)長(zhǎng)連接因?yàn)殚L(zhǎng)時(shí)間沒有數(shù)據(jù)交互,被中間的網(wǎng)絡(luò)設(shè)備關(guān)閉了。后來為了解決這個(gè)問題,我們?cè)诔跏蓟痺ebsocket庫時(shí)設(shè)置心跳參數(shù),這樣上述websocket長(zhǎng)連接在空閑的時(shí)候能跑一跑心跳包,這樣就能確保該長(zhǎng)連接不會(huì)因?yàn)殚L(zhǎng)時(shí)間沒有跑數(shù)據(jù)被無故關(guān)閉的問題了。
我們?cè)谡{(diào)用lws_create_context接口創(chuàng)建websockets會(huì)話上下文時(shí),該接口的結(jié)構(gòu)體參數(shù)lws_context_creation_info中,有設(shè)置心跳參數(shù)的字段:
/**
- struct lws_context_creation_info - parameters to create context with
- This is also used to create vhosts.... if LWS_SERVER_OPTION_EXPLICIT_VHOSTS
- is not given, then for backwards compatibility one vhost is created at
- context-creation time using the info from this struct.
- If LWS_SERVER_OPTION_EXPLICIT_VHOSTS is given, then no vhosts are created
- at the same time as the context, they are expected to be created afterwards.
- @port: VHOST: Port to listen on... you can use CONTEXT_PORT_NO_LISTEN to
suppress listening on any port, that's what you want if you are
not running a websocket server at all but just using it as a
client
- @iface: VHOST: NULL to bind the listen socket to all interfaces, or the
interface name, eg, "eth2"
If options specifies LWS_SERVER_OPTION_UNIX_SOCK, this member is
the pathname of a UNIX domain socket. you can use the UNIX domain
sockets in abstract namespace, by prepending an @ symbole to the
socket name.
- @protocols: VHOST: Array of structures listing supported protocols and a protocol-
specific callback for each one. The list is ended with an
entry that has a NULL callback pointer.
It's not const because we write the owning_server member
- @extensions: VHOST: NULL or array of lws_extension structs listing the
extensions this context supports. If you configured with
--without-extensions, you should give NULL here.
- @token_limits: CONTEXT: NULL or struct lws_token_limits pointer which is initialized
with a token length limit for each possible WSI_TOKEN_***
- @ssl_cert_filepath: VHOST: If libwebsockets was compiled to use ssl, and you want
to listen using SSL, set to the filepath to fetch the
server cert from, otherwise NULL for unencrypted
- @ssl_private_key_filepath: VHOST: filepath to private key if wanting SSL mode;
if this is set to NULL but sll_cert_filepath is set, the
OPENSSL_CONTEXT_REQUIRES_PRIVATE_KEY callback is called
to allow setting of the private key directly via openSSL
library calls
- @ssl_ca_filepath: VHOST: CA certificate filepath or NULL
- @ssl_cipher_list: VHOST: List of valid ciphers to use (eg,
"RC4-MD5:RC4-SHA:AES128-SHA:AES256-SHA:HIGH:!DSS:!aNULL"
or you can leave it as NULL to get "DEFAULT"
- @http_proxy_address: VHOST: If non-NULL, attempts to proxy via the given address.
If proxy auth is required, use format
"username:password@server:port"
- @http_proxy_port: VHOST: If http_proxy_address was non-NULL, uses this port at
the address
- @gid: CONTEXT: group id to change to after setting listen socket, or -1.
- @uid: CONTEXT: user id to change to after setting listen socket, or -1.
- @options: VHOST + CONTEXT: 0, or LWS_SERVER_OPTION_... bitfields
- @user: CONTEXT: optional user pointer that can be recovered via the context
pointer using lws_context_user
- @ka_time: CONTEXT: 0 for no keepalive, otherwise apply this keepalive timeout to
all libwebsocket sockets, client or server
- @ka_probes: CONTEXT: if ka_time was nonzero, after the timeout expires how many
times to try to get a response from the peer before giving up
and killing the connection
- @ka_interval: CONTEXT: if ka_time was nonzero, how long to wait before each ka_probes
attempt
- @provided_client_ssl_ctx: CONTEXT: If non-null, swap out libwebsockets ssl
implementation for the one provided by provided_ssl_ctx.
Libwebsockets no longer is responsible for freeing the context
if this option is selected.
- @max_http_header_data: CONTEXT: The max amount of header payload that can be handled
in an http request (unrecognized header payload is dropped)
- @max_http_header_pool: CONTEXT: The max number of connections with http headers that
can be processed simultaneously (the corresponding memory is
allocated for the lifetime of the context). If the pool is
busy new incoming connections must wait for accept until one
becomes free.
- @count_threads: CONTEXT: how many contexts to create in an array, 0 = 1
- @fd_limit_per_thread: CONTEXT: nonzero means restrict each service thread to this
many fds, 0 means the default which is divide the process fd
limit by the number of threads.
- @timeout_secs: VHOST: various processes involving network roundtrips in the
library are protected from hanging forever by timeouts. If
nonzero, this member lets you set the timeout used in seconds.
Otherwise a default timeout is used.
- @ecdh_curve: VHOST: if NULL, defaults to initializing server with "prime256v1"
- @vhost_name: VHOST: name of vhost, must match external DNS name used to
access the site, like "warmcat.com" as it's used to match
Host: header and / or SNI name for SSL.
- @plugin_dirs: CONTEXT: NULL, or NULL-terminated array of directories to
scan for lws protocol plugins at context creation time
- @pvo: VHOST: pointer to optional linked list of per-vhost
options made accessible to protocols
- @keepalive_timeout: VHOST: (default = 0 = 60s) seconds to allow remote
client to hold on to an idle HTTP/1.1 connection
- @log_filepath: VHOST: filepath to append logs to... this is opened before
any dropping of initial privileges
- @mounts: VHOST: optional linked list of mounts for this vhost
- @server_string: CONTEXT: string used in HTTP headers to identify server
software, if NULL, "libwebsockets".
/
struct lws_context_creation_info {
int port; / VH */
const char iface; / VH */
const struct lws_protocols protocols; / VH */
const struct lws_extension extensions; / VH */
const struct lws_token_limits token_limits; / context */
const char ssl_private_key_password; / VH */
const char ssl_cert_filepath; / VH */
const char ssl_private_key_filepath; / VH */
const char ssl_ca_filepath; / VH */
const char ssl_cipher_list; / VH */
const char http_proxy_address; / VH /
unsigned int http_proxy_port; / VH /
int gid; / context /
int uid; / context /
unsigned int options; / VH + context */
void user; / context /
int ka_time; / context /
int ka_probes; / context /
int ka_interval; / context */
#ifdef LWS_OPENSSL_SUPPORT
SSL_CTX provided_client_ssl_ctx; / context /
#else / maintain structure layout either way */
void provided_client_ssl_ctx;
#endif
short max_http_header_data; / context /
short max_http_header_pool; / context /
unsigned int count_threads; / context /
unsigned int fd_limit_per_thread; / context /
unsigned int timeout_secs; / VH */
const char ecdh_curve; / VH */
const char vhost_name; / VH */
const char * const plugin_dirs; / context */
const struct lws_protocol_vhost_options pvo; / VH /
int keepalive_timeout; / VH */
const char log_filepath; / VH */
const struct lws_http_mount mounts; / VH */
const char server_string; / context /
/ Add new things just above here ---^
- This is part of the ABI, don't needlessly break compatibility
- The below is to ensure later library versions with new
- members added above will see 0 (default) even if the app
- was not built against the newer headers.
*/
void *_unused[8];
};
其中的ka_time、ka_probes和ka_interval三個(gè)字段就是心跳相關(guān)的設(shè)置參數(shù)。我們初始化websockets上下文的代碼如下:
static lws_context* CreateContext()
{
lws_set_log_level( 0xFF, NULL );
lws_context* plcContext = NULL;
lws_context_creation_info tCreateinfo;
memset(&tCreateinfo, 0, sizeof tCreateinfo);
tCreateinfo.port = CONTEXT_PORT_NO_LISTEN;
tCreateinfo.protocols = protocols;
tCreateinfo.ka_time = LWS_TCP_KEEPALIVE_TIME;
tCreateinfo.ka_interval = LWS_TCP_KEEPALIVE_INTERVAL;
tCreateinfo.ka_probes = LWS_TCP_KEEPALIVE_PROBES;
tCreateinfo.options = LWS_SERVER_OPTION_DISABLE_IPV6;
plcContext = lws_create_context(&tCreateinfo);
return plcContext;
}
通過查閱libwebsockets開源庫代碼得知,此處設(shè)置的心跳使用的就是TCPIP協(xié)議棧的心跳機(jī)制,如下所示:
LWS_VISIBLE int
lws_plat_set_socket_options(struct lws_vhost *vhost, lws_sockfd_type fd)
{
int optval = 1;
int optlen = sizeof(optval);
u_long optl = 1;
DWORD dwBytesRet;
struct tcp_keepalive alive;
int protonbr;
#ifndef _WIN32_WCE
struct protoent *tcp_proto;
#endif
if (vhost->ka_time) {
/* enable keepalive on this socket */
// 先調(diào)用setsockopt打開發(fā)送心跳包(設(shè)置)選項(xiàng)
optval = 1;
if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE,
(const char *)&optval, optlen) < 0)
return 1;
alive.onoff = TRUE;
alive.keepalivetime = vhost->ka_time*1000;
alive.keepaliveinterval = vhost->ka_interval*1000;
if (WSAIoctl(fd, SIO_KEEPALIVE_VALS, &alive, sizeof(alive),
NULL, 0, &dwBytesRet, NULL, NULL))
return 1;
}
/* Disable Nagle */
optval = 1;
#ifndef _WIN32_WCE
tcp_proto = getprotobyname("TCP");
if (!tcp_proto) {
lwsl_err("getprotobyname() failed with error %dn", LWS_ERRNO);
return 1;
}
protonbr = tcp_proto->p_proto;
#else
protonbr = 6;
#endif
setsockopt(fd, protonbr, TCP_NODELAY, (const char *)&optval, optlen);
/* We are nonblocking... */
ioctlsocket(fd, FIONBIO, &optl);
return 0;
}
4、TCPIP丟包重傳機(jī)制
如果網(wǎng)絡(luò)出故障時(shí),客戶端與服務(wù)器之間正在進(jìn)行TCP數(shù)據(jù)交互,客戶端給服務(wù)器發(fā)送數(shù)據(jù)包后因?yàn)榫W(wǎng)絡(luò)故障收不到服務(wù)器的ACK包,就會(huì)觸發(fā)客戶端的TCP丟包重傳,丟包重傳機(jī)制也能判斷出網(wǎng)絡(luò)出現(xiàn)異常。
對(duì)于TCP連接,客戶端給服務(wù)器發(fā)送數(shù)據(jù)后沒有收到服務(wù)器的ACK包,會(huì)觸發(fā)丟包重傳。每次重傳的時(shí)間間隔會(huì)加倍,當(dāng)重傳次數(shù)達(dá)到系統(tǒng)上限(Windows默認(rèn)的上限是5次,Linux默認(rèn)的上限是15次)后,協(xié)議棧就認(rèn)為網(wǎng)絡(luò)出故障了,會(huì)直接將對(duì)應(yīng)的連接關(guān)閉了。
所以當(dāng)網(wǎng)絡(luò)出現(xiàn)故障時(shí)有數(shù)據(jù)交互,協(xié)議棧會(huì)在數(shù)十秒內(nèi)檢測(cè)到網(wǎng)路出現(xiàn)異常,就會(huì)直接將連接直接關(guān)閉掉。丟包重傳機(jī)制的詳細(xì)描述如下所示:
對(duì)于丟包重傳機(jī)制,可以通過給PC插拔網(wǎng)線來查看,可以使用wireshark抓包看一下??焖俨灏尉W(wǎng)線時(shí)(先拔掉網(wǎng)線,等待幾秒鐘再將網(wǎng)線插上),給服務(wù)器發(fā)送的操作指令會(huì)因?yàn)閬G包重傳會(huì)收到數(shù)據(jù)的。
5、使用非阻塞socket和select接口實(shí)現(xiàn)connect連接的超時(shí)控制
5.1、MSDN上對(duì)connect和select接口的說明
對(duì)于tcp套接字,我們需要調(diào)用套接字函數(shù)connect去建立TCP連接。我們先來看看微軟MSDN上對(duì)套接字接口connect的描述:
On a blocking socket, the return value indicates success or failure of the connection attempt.
對(duì)于阻塞式的socket,通過connect的返回值就能確定有沒有連接成功,返回0表示連接成功。
With a nonblocking socket, the connection attempt cannot be completed immediately. In this case, connect will return SOCKET_ERROR, and WSAGetLastError will return WSAEWOULDBLOCK. In this case, there are three possible scenarios:
Use the select function to determine the completion of the connection request by checking to see if the socket is writeable.
對(duì)于非組賽式的socket,connect調(diào)用會(huì)立即返回,但連接操作還沒有完成。connect返回SOCKET_ERROR,對(duì)于非阻塞式socket,返回SOCKET_ERROR并不表示失敗,需要調(diào)用WSAGetLastError獲取connect函數(shù)執(zhí)行后的LastError值,一般此時(shí)WSAGetLastError會(huì)返回WSAEWOULDBLOCK:
表明連接正在進(jìn)行中。可以使用select接口檢測(cè)一下套接字是否可寫(套接字是否在writefds集合中),如果可寫,則表示連接成功。如果套接字在exceptfds集合中,則說明連接出現(xiàn)了異常,如下所示:
5.2、使用非阻塞socket和select實(shí)現(xiàn)連接超時(shí)的控制
對(duì)于阻塞式的socket,在Windows下,如果遠(yuǎn)端的IP和Port不可達(dá),則會(huì)阻塞75s后返回SOCKET_ERROR,表明連接失敗。所以當(dāng)我們測(cè)試遠(yuǎn)端的IP和Port是否可以連接時(shí),我們不使用阻塞式的socket,而是使用非阻塞式socket,然后調(diào)用select,通過select添加連接超時(shí)時(shí)間,實(shí)現(xiàn)連接超時(shí)的控制。
select函數(shù)因?yàn)槌瑫r(shí)返回,會(huì)返回0;如果發(fā)生錯(cuò)誤,則返回SOCKET_ERROR,所以判斷時(shí)要判斷select返回值,如果小于等于0,則是連接失敗,立即將套接字關(guān)閉掉。如果select返回值大于0,則該返回值是已經(jīng)準(zhǔn)備就緒的socket個(gè)數(shù),比如連接成功的socket。我們判斷套接字是否在可寫集合writefds中,如果在該集合中,則表示連接成功。
根據(jù)MSDN上的相關(guān)描述,我們就能大概知道該如何實(shí)現(xiàn)connect的超時(shí)控制了,相關(guān)代碼如下:
bool ConnectDevice( char* pszIP, int nPort )
{
// 創(chuàng)建TCP套接字
SOCKET connSock = socket(AF_INET, SOCK_STREAM, 0);
if (connSock == INVALID_SOCKET)
{
return false;
}
// 填充IP和端口
SOCKADDR_IN devAddr;
memset(&devAddr, 0, sizeof(SOCKADDR_IN));
devAddr.sin_family = AF_INET;
devAddr.sin_port = htons(nPort);
devAddr.sin_addr.s_addr = inet_addr(pszIP);
// 將套接字設(shè)置為非阻塞式的,為下面的select做準(zhǔn)備
unsigned long ulnoblock = 1;
ioctlsocket(connSock, FIONBIO, &ulnoblock);
// 發(fā)起connnect,該接口立即返回
connect(connSock, (sockaddr*)&devAddr, sizeof(devAddr));
FD_SET writefds;
FD_ZERO(&writefds);
FD_SET(connSock, &writefds);
// 設(shè)置連接超時(shí)時(shí)間為1秒
timeval tv;
tv.tv_sec = 1; //超時(shí)1s
tv.tv_usec = 0;
// The select function returns the total number of socket handles that are ready and contained
// in the fd_set structures, zero if the time limit expired, or SOCKET_ERROR(-1) if an error occurred.
if (select(0, NULL, &writefds, NULL, &tv) <= 0)
{
closesocket(connSock);
return false; //超時(shí)未連接上就退出
}
ulnoblock = 0;
ioctlsocket(connSock, FIONBIO, &ulnoblock);
closesocket(connSock);
return true;
}
-
IP協(xié)議
+關(guān)注
關(guān)注
3文章
84瀏覽量
21599 -
服務(wù)器
+關(guān)注
關(guān)注
12文章
8701瀏覽量
84563 -
TCP
+關(guān)注
關(guān)注
8文章
1324瀏覽量
78757 -
音視頻
+關(guān)注
關(guān)注
4文章
452瀏覽量
29782
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論