IO模型之I/O多路复用

创始人

2024-03-15 14:47:13

0次

什么是IO多路复用？

假如我们设计了一个程序，该程序从标准输入接收数据输入，然后通过套接字发送出去，同时，改程序也通过套接字接收对方发送的数据流。
我们可以调用某些语言提供的API等待标准输入，但是一旦这样做，就没有办法在套接字有数据的时候读出数据；我们也可以使用某些API等待套接字有数据返回，但是这样做，也没有办法在标准输入有数据的情况下，读入数据并发送给对方。
I/O多路复用意味着我们可以同时监视多个fd上的I/O，并在其中任何一个就绪时读取，而非阻塞I/O意味着在读取fd时，如果没有可用的，则立即返回一个错误，而不是等待/阻塞直到数据可用。

典型I/O多路复用实现

IO模型	相对性能	关键思路	操作系统	Java支持情况
select	较高	Reactor	windows/Linux	支持，Linux操作系统的 kernels 2.4内核版本之前，默认使用select；而目前windows下对同步IO的支持，都是select模型
poll	较高	Reactor	Linux	Linux下的JAVA NIO框架，Linux kernels 2.6内核版本之前使用poll进行支持。也是使用的Reactor模式
epoll	高	Reactor/Preactor	Linux	Linux kernels 2.6内核版本及以后使用epoll进行支持；Linux kernels 2.6内核版本之前使用poll进行支持；另外一定注意，由于Linux下没有Windows下的IOCP技术提供真正的异步IO 支持，所以Linux下使用epoll模拟异步IO
kqueue	较高	Preactor	Linux	目前Java不支持

select

select函数就是一种常见的I/O多路复用技术，使用select函数，通知内核挂起进程，当一个或多个I/O事件发生后，将控制权返还给应用程序，由应用程序进行I/O事件的处理。
缺点： 支持的文件描述符个数是有限的，在Linux系统中，select的默认最大值为1024.

select伪代码

有一个is_read(int fd)函数，来判断fd是否准备就绪。

//当fd 准备就绪时返回true
bool is_ready(int fd);struct fd_info {
int fd;
bool ready;
};
//fds：文件描述符集合
//max_fd：文件描述符个数
int select(set fds, int max_fd) {int ready_cnt = 0;while (ready_cnt == 0) {//遍历fd集合for (int i = 0; i < max_fd; i++) {//判断fd是否就绪if (is_ready(i)) {auto it = fds.find(i);//修改fd状态it->ready = true;ready_cnt++;}}}//返回就绪的文件描述符个数return ready_cnt;
}

调用伪代码

set fds;
while (1) {// 每次调用select前都需要重新初始化fds集合fds.clear();fds.inert({.fd = 1})fds.inert({.fd = 100})int ready_cnt = select(fds, /*max_fd=*/100 + 1);assert(ready_cnt > 0);for (int i = 0; i < fds.size(); i++) {if (fds[i].ready) {// Use fds[i].fd}}
}

poll

poll是另一种在各种Unix系统上被广泛支持的I/O多路复用技术，虽然名声没有select那么响，但能力一点不比select差，而且因为可以突破select文件描述符的个数限制，在高并发下及其占优势。当需要控制的fd不超过1000的情况下，则使用select也够了。

poll伪代码

// 当fd 准备就绪时返回true
bool is_ready(int fd);struct fd_info {int fd;bool ready;
};
//fd_info* fds：fd集合指针
//nfds：fd个数
int poll(struct fd_info* fds, int nfds) {int ready_cnt = 0;while(ready_cnt == 0) {for (int i = 0; i < nfds; i++) {if (is_ready(fds[i])) {fds[i].ready = true;ready_cnt++;} else {fds[i] = false;}}}return ready_cnt;
}

调用伪代码

// 只需要初始化一次
fd_info fds[2];
fds[0].fd = 1;
fds[1].fd = 100;int nfds = 2;while (1) {int ready_cnt = poll(fds, nfds);assert(ready_cnt > 0);for (int i = 0; i < nfds; i++) {if (fds[i].ready) {// Use fds[i].fd}}
}

epoll

事件触发机制

条件触发（Level-triggered LT）: poll和select都是使用条件触发，条件触发的意思是只要满足事件的条件，比如有数据需要读，就一直不断的把这个事件传递给用户。
边缘出发（Edge-triggered ET）： 边缘触发是epoll独创的，意思是只有第一次满足条件的时候才触发，之后就不会再传递同样的事件了。

epoll伪代码

add_monitor是用来触发外部的线程来监控all_fds将就绪的fd添加到read_fds中。

// Start monitoring fds in `all_fds` and constantly adds ready ones to
// `ready_fds`.
void add_monitor(const vector& all_fds, vector& ready_fds);struct fd_info {int fd;bool ready;
};struct epoll_info {vector all_fds;vector ready_fds;
};map epoll_info_by_epoll_id;// 创建epoll实例，并返回他的id
int epoll_create() {return epoll_info_by_epoll_fd.size();
}// 向epoll中添加监控fd
void epoll_add(int epoll_id, int fd) {epoll_info_by_epoll_id[epoll_id].push_back(fd);
}// 等待直到至少有一个fd就绪，返回就绪的fd个数
// Afte the function returns, the first `ready_cnt` of `ready_fds` contain
// ready fds. The rest can be ignored.
int epoll_wait(int epoll_id, struct fd_info* ready_fds) {int ready_cnt = 0;struct epoll_info info = epoll_info_by_epoll_id[epoll_id];add_monitor(info.allfds, info.ready_fds);while (ready_cnt == 0) {ready_cnt = ready_fds.size();for (int i = 0; i < ready_cnt; i++) {ready_fds[i].fd = ready_fds[i];ready_fds[i].ready = true;}}return ready_cnt;
}

不同于select和poll只有一个api，epoll提供了3个api；
调用epoll_create和epoll_add建立epoll实例，同时循环调用epoll_wait不断等待epoll_add添加的fds；
内部循环的复杂度是O(ready fds)。最坏情况的复杂度仍然是O(n)。然而，在可监视的就绪fd大多比要监视的fd少得多的情况下，epoll比poll具有更好的性能。换句话说，即使两个算法的复杂度都是O(n)，在现实中，n=3和n=10000可能关系很大。

调用者伪代码

int epoll_id = epoll_create();
epoll_add(epoll_id, 1);
epoll_add(epoll_id, 100);while (1) {struct fd_info fds[2];int ready_cnt = epoll_wait(epoll_id, fds);assert(ready_cnt > 0);for (int i = 0; i < ready_cnt; i++) {// Use fds[i].fd}
}

select、poll、epoll的区别

通过以上的伪代码就可以看出来三者的区别：

select和poll在使用者层面，select需要在调用前每次都重新初始化fds，而poll由于传递的是指针，则不需要每次初始化；
在实现上，伪代码并没有展示不同，但select是有fds上限的，上限为1024，而poll却没有；
poll和epoll，实现层面提供的api不同，epoll将获取就绪的fd交给另外的线程去处理，自己只需要关注就绪的fd集合，通知用户线程即可，而poll则需要自己遍历所有的fd，判断是否就绪，将就绪的fd返回给用户线程。

多路复用IO的优缺点

优点

不用再使用多线程进行IO处理了（包括操作系统内核IO管理模块和应用程序进程而言）。当然实际业务处理中，应用程序进程还可以引入线程池技术
同一个端口可以处理多种协议，例如，使用ServerSocketChannel服务器端口监听，既可以处理TCP协议又可以处理UDP协议。
操作系统级别的优化：多路复用IO及时可以是操作系统级别在一个端口上能够同时接受多个客户端的IO事件，同时具有之前我们讲到的阻塞式同步IO和非阻塞时同步IO的所有特点。Selector的一部分作用更相当于“”轮询代理器

缺点

都是同步IO：阻塞式IO、非阻塞时IO甚至包括多路复用IO，这些都是操作系统级别对“同步IO”的实现。

参考

select v.s. poll v.s. epoll
https://pdai.tech/md/java/io/java-io-nio-select-epoll.html

上一篇：微信小程序-读取数据

下一篇：Java搭建宝塔部署实战毕设项目WMS仓库管理系统源码

IO模型之I/O多路复用

什么是IO多路复用？

典型I/O多路复用实现

select

select伪代码

调用伪代码

poll

poll伪代码

调用伪代码

epoll

epoll伪代码

调用者伪代码

select、poll、epoll的区别

多路复用IO的优缺点

优点

缺点

参考

相关内容

热门资讯