经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 数据库/运维 » Linux/Shell » 查看文章
[转] Linux Asynchronous I/O Explained
来源:cnblogs  作者:xuyaowen  时间:2019/4/10 8:40:24  对本文有异议
  1. Linux Asynchronous I/O Explained (Last updated: 13 Apr 2012)
  2. *******************************************************************************
  3. by Vasily Tarasov <tarasov AT vasily dot name>
  4.  
  5. Asynchronoes I/O (AIO) is a method for performing I/O operations so that the
  6. process that issued an I/O request is not blocked till the data is available.
  7. Instead, after an I/O request is submitted, the process continues to execute
  8. its code and can later check the status of the submitted request.
  9.  
  10. Linux kernel provides only *5* system calls for performing asynchronoes I/O.
  11. Other AIO functions commonly descibed in the literature are implemented in the
  12. user space libraries and use the system calls internally. Some libraries can
  13. also emulate AIO functionality entirely in the user space without any kernel
  14. support.
  15.  
  16. There are two main libraries in Linux that facilitate AIO, we will refer to
  17. them as *libaio* and *librt* (the latter one is a part of libc).
  18.  
  19. In this text, I first discuss system calls, then libaio, and finaly librt.
  20.  
  21. AIO System Calls
  22. *******************************************************************************
  23. based on Linux 3.2.1 kernel
  24.  
  25. AIO system call entry points are located in "fs/aio.c" file in the kernel's
  26. source code. Types and constants exported to the user space reside in
  27. "/usr/include/linux/aio_abi.h" header file.
  28.  
  29. There are only 5 AIO system calls:
  30.  
  31. * int io_setup(unsigned nr_events, aio_context_t *ctxp);
  32.  
  33. * int io_destroy(aio_context_t ctx);
  34.  
  35. * int io_submit(aio_context_t ctx, long nr, struct iocb *cbp[]);
  36.  
  37. * int io_cancel(aio_context_t ctx, struct iocb *, struct io_event *result);
  38.  
  39. * int io_getevents(aio_context_t ctx, long min_nr, long nr,
  40. struct io_event *events, struct timespec *timeout);
  41.  
  42. I will demonstrate the usage of these system calls using a sequence of programs
  43. in the increasing order of their complexity.
  44.  
  45. Program 1:
  46.  
  47. >> snip start: 1.c >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
  48.  
  49. 00 #define _GNU_SOURCE /* syscall() is not POSIX */
  50. 01
  51. 02 #include <stdio.h> /* for perror() */
  52. 03 #include <unistd.h> /* for syscall() */
  53. 04 #include <sys/syscall.h> /* for __NR_* definitions */
  54. 05 #include <linux/aio_abi.h> /* for AIO types and constants */
  55. 06
  56. 07 inline int io_setup(unsigned nr, aio_context_t *ctxp)
  57. 08 {
  58. 09 return syscall(__NR_io_setup, nr, ctxp);
  59. 10 }
  60. 11
  61. 12 inline int io_destroy(aio_context_t ctx)
  62. 13 {
  63. 14 return syscall(__NR_io_destroy, ctx);
  64. 15 }
  65. 16
  66. 17 int main()
  67. 18 {
  68. 19 aio_context_t ctx;
  69. 20 int ret;
  70. 21
  71. 22 ctx = 0;
  72. 23
  73. 24 ret = io_setup(128, &ctx);
  74. 25 if (ret < 0) {
  75. 26 perror("io_setup error");
  76. 27 return -1;
  77. 28 }
  78. 29
  79. 30 ret = io_destroy(ctx);
  80. 31 if (ret < 0) {
  81. 32 perror("io_destroy error");
  82. 33 return -1;
  83. 34 }
  84. 35
  85. 36 return 0;
  86. 37 }
  87.  
  88. << snip end: 1.c <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  89.  
  90. For now, ignore first 17 lines of the code and look at main() function. In line
  91. 24 we call io_setup() system call to create so called "AIO context" in the
  92. kernel. AIO context is a set of data structures that the kernel supports to
  93. perform AIO. Every process can have multiple AIO contextes and as such one
  94. needs an identificator for every AIO context in a process (XXX: come up with a
  95. handy example how it can be used). Ctx variable of type aio_context_t defined in
  96. line 19 stores such an identificator in our example. A pointer to ctx variable
  97. is passed to io_setup() as a second argument and kernel fills this variable
  98. with a context identifier. Interestingly, aio_context_t is actually just an
  99. unsigned long defined in the kernel ("linux/aio_abi.h") like that:
  100.  
  101. typedef unsigned long aio_context_t;
  102.  
  103. In line 22 we set ctx to 0 which is required by kernel or io_setup() fails with
  104. -EINVAL error.
  105.  
  106. The first argument of io_setup() function - 128 in our case - is the maximum
  107. number of requests that can simultaneously reside in the context. This will be
  108. explained in more details in the next examples.
  109.  
  110. In line 30 we destroy just created AIO context by calling io_destroy() system
  111. call with ctx as an argument.
  112.  
  113. The lines above 17 are just helpers that allow to call system calls directly. We
  114. use glibc's syscall() function that invokes any system call by its number. It
  115. is only required if one wants to call system calls directly without using AIO
  116. libraries' wrapper functions (provided by libaio and librt). Notice, that
  117. syscall() functions's return value follows the usual conventions for indicating
  118. an error: -1, with errno set to a positive value that indicates the error.
  119. So, we check if the values returned by io_setup() and io_destroy() are less than
  120. zero to detect the error, and then use perror() function that will print the
  121. errno.
  122.  
  123. In the last example we did a minimal thing: created an AIO context and then
  124. destroyed it immediatelly. In the next example we submit one request to the
  125. context and then query its status later.
  126.  
  127. Program 2:
  128.  
  129. >> snip start: 2.c >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
  130.  
  131. 00 #define _GNU_SOURCE /* syscall() is not POSIX */
  132. 01
  133. 02 #include <stdio.h> /* for perror() */
  134. 03 #include <unistd.h> /* for syscall() */
  135. 04 #include <sys/syscall.h> /* for __NR_* definitions */
  136. 05 #include <linux/aio_abi.h> /* for AIO types and constants */
  137. 06 #include <fcntl.h> /* O_RDWR */
  138. 07 #include <string.h> /* memset() */
  139. 08 #include <inttypes.h> /* uint64_t */
  140. 09
  141. 10 inline int io_setup(unsigned nr, aio_context_t *ctxp)
  142. 11 {
  143. 12 return syscall(__NR_io_setup, nr, ctxp);
  144. 13 }
  145. 14
  146. 15 inline int io_destroy(aio_context_t ctx)
  147. 16 {
  148. 17 return syscall(__NR_io_destroy, ctx);
  149. 18 }
  150. 19
  151. 20 inline int io_submit(aio_context_t ctx, long nr, struct iocb **iocbpp)
  152. 21 {
  153. 22 return syscall(__NR_io_submit, ctx, nr, iocbpp);
  154. 23 }
  155. 24
  156. 25 inline int io_getevents(aio_context_t ctx, long min_nr, long max_nr,
  157. 26 struct io_event *events, struct timespec *timeout)
  158. 27 {
  159. 28 return syscall(__NR_io_getevents, ctx, min_nr, max_nr, events, timeout);
  160. 29 }
  161. 30
  162. 31 int main()
  163. 32 {
  164. 33 aio_context_t ctx;
  165. 34 struct iocb cb;
  166. 35 struct iocb *cbs[1];
  167. 36 char data[4096];
  168. 37 struct io_event events[1];
  169. 38 int ret;
  170. 39 int fd;
  171. 40
  172. 41 fd = open("/tmp/testfile", O_RDWR | O_CREAT);
  173. 42 if (fd < 0) {
  174. 43 perror("open error");
  175. 44 return -1;
  176. 45 }
  177. 46
  178. 47 ctx = 0;
  179. 48
  180. 49 ret = io_setup(128, &ctx);
  181. 50 if (ret < 0) {
  182. 51 perror("io_setup error");
  183. 52 return -1;
  184. 53 }
  185. 54
  186. 55 /* setup I/O control block */
  187. 56 memset(&cb, 0, sizeof(cb));
  188. 57 cb.aio_fildes = fd;
  189. 58 cb.aio_lio_opcode = IOCB_CMD_PWRITE;
  190. 59
  191. 60 /* command-specific options */
  192. 61 cb.aio_buf = (uint64_t)data;
  193. 62 cb.aio_offset = 0;
  194. 63 cb.aio_nbytes = 4096;
  195. 64
  196. 65 cbs[0] = &cb;
  197. 66
  198. 67 ret = io_submit(ctx, 1, cbs);
  199. 68 if (ret != 1) {
  200. 69 if (ret < 0)
  201. 70 perror("io_submit error");
  202. 71 else
  203. 72 fprintf(stderr, "could not sumbit IOs");
  204. 73 return -1;
  205. 74 }
  206. 75
  207. 76 /* get the reply */
  208. 77 ret = io_getevents(ctx, 1, 1, events, NULL);
  209. 78 printf("%d\n", ret);
  210. 79
  211. 80 ret = io_destroy(ctx);
  212. 81 if (ret < 0) {
  213. 82 perror("io_destroy error");
  214. 83 return -1;
  215. 84 }
  216. 85
  217. 86 return 0;
  218. 87 }
  219.  
  220. << snip end: 2.c <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  221.  
  222. Every I/O request that is submitted to an AIO context is represented by an I/O
  223. control block structure - struct iocb - declared in line 34. We initialize this
  224. structure in lines 55-63. First, the whole structure is zeroed, then file
  225. descriptor (aio_fildes) and command (aio_lio_opcode) fields are set.
  226.  
  227. File descriptor corresponds to a previously opened file, in our example we
  228. open "/tmp/testfile" file in line 41.
  229.  
  230. AIO commands currently supported by Linux kernel are:
  231.  
  232. IOCB_CMD_PREAD
  233. positioned read; corresponds to pread() system call.
  234.  
  235. IOCB_CMD_PWRITE
  236. positioned write; corresponds to pwrite() system call.
  237.  
  238. IOCB_CMD_FSYNC
  239. sync file's data and metadata with disk; corresponds to fsync() system call.
  240.  
  241. IOCB_CMD_FDSYNC
  242. sync file's data and metadata with disk, but only metadata needed to access
  243. modified file data is written; corresponds to fdatasync() system call.
  244.  
  245. IOCB_CMD_PREADV
  246. vectored positioned read, sometimes called "scattered input";
  247. corresponds to pread() system call.
  248.  
  249. IOCB_CMD_PWRITEV
  250. vectored positioned write, sometimes called "gathered output";
  251. corresponds to pwrite() system call.
  252.  
  253. IOCB_CMD_NOOP
  254. defined in the header file, but is not used anywhere else in the kernel.
  255.  
  256. The semantics of other fields in the iocb structure depends on the command
  257. specified. For now, we will limit our discussion to IOCB_CMD_PREAD and
  258. IOCB_CMD_PWRITE commands. After understanding AIO interface for these two
  259. commands, we will look into the remaining ones.
  260.  
  261. In lines 60-63 of our running example we set command-specific fields of iocb
  262. structure: aio_buf and aio_nbytes corresond to a region in memory to which
  263. data should be read or written to; aio_offset is an absolute offset in a file.
  264.  
  265. Now, when one I/O control block is ready, we put a pointer to it in an array
  266. (line 65) and then pass this array to the io_submit() system call (line 67).
  267. io_submit() takes AIO context ID, size of the array and the array itself as the
  268. arguments. Notice, that array should contain *pointers* to the iocb structures,
  269. not the structures themself.
  270.  
  271. io_submit()'s return code can be one of the following values:
  272.  
  273. A) ret = (number of iocbs sumbmitted)
  274. Ideal case, all iocbs were accepted for processing.
  275.  
  276. B) 0 < ret < (number of iocbs sumbmitted)
  277. io_submit() system call processes iocbs one by one starting from
  278. the first entry in the passed array. If submission of some iocb fails,
  279. it stops at this point and returns the index of iocb that failed.
  280. There is no way to know what is the exact reason of a failure.
  281. However, if the very first iocb submission fails, see point C.
  282.  
  283. C) ret < 0
  284. There are two reasons why this could happen:
  285. 1) Some error happened even before io_submit() started to iterate
  286. over iocbs in the array (e.g., AIO context was invalid).
  287. 2) The submission of the very first iocb (cbx[0]) failed).
  288.  
  289. So, in our example, we handle io_submit()'s return code in an unusual way. If
  290. return code is not equal to the number of iocbs, then that is a clear error but
  291. we don't know its reason (errno is not set). Consequently, we use
  292. fprintf(stderr, ...) function to print error notification on the screen.
  293. Otherwise, if return code is less than zero, then we know the error (errno is
  294. set) and use perror() function instead. Notice, that in case of a single iocb
  295. in the array (as in our example) such a complex error handling makes less sense:
  296. if the first (and only) iocb fails, we are guaranteed to get an error
  297. information (see point C above). We handle error in a more complex way in this
  298. example only to reuse the same code later, when we submit multiple iocbs in a
  299. single io_submit() call.
  300.  
  301. After iocb is submitted we can perform any other actions without waiting for I/O
  302. to complete. For every completed I/O request (successfully or unsuccessfully)
  303. kernel creates an io_event structure. To obtain the list of io_events (and
  304. consequently all completed iocbs) io_getevent() system call should be used (line
  305. 77). When calling io_getevents(), one needs to specify:
  306.  
  307. a) which AIO context to get events from (ctx variable)
  308.  
  309. b) a buffer where the kernel should load events to (events varaiable)
  310.  
  311. c) minimal number of events one wants to get (first 1 in our program).
  312. If less then this number of iocbs are currently completed,
  313. io_getevents() will block till enough events appear. See point e)
  314. for more details on how to control blocking time.
  315. d) maximum number of events one wants to get. This usually is
  316. the size of the events buffer (second 1 in our program)
  317.  
  318. e) If not enough events are available, we don't want to wait forever.
  319. One can specify a relative deadline as the last argument.
  320. NULL in this case means to wait infinitely.
  321. If one wants io_getevents() not to block at all then
  322. timespec timeout structure need to be initialzed to zero
  323. seconds and zero nanoseconds.
  324.  
  325. The return code of io_getevents can be:
  326.  
  327. A) ret = (max number of events)
  328. All events that fit in the user provided buffer were obtained
  329. from the kernel. There might be more pending events in the kernel.
  330. B) (min number of events) <= ret <= (max number of events)
  331. All currently available events were read from the kernel and no
  332. blocking happened.
  333. C) 0 < ret < (min number of events)
  334. All currently available events were read from the kernel and
  335. we blocked to wait for the time user has specified.
  336. E) ret = 0
  337. no events are available XXX:? does blocking happen in this case?..
  338.  
  339. F) ret < 0
  340. an error happened
  341.  
  342.  
  343. TO BE CONTINUED...
  344.  
  345.  
  346. /proc/sys/fs/aio-max-nr
  347. /proc/sys/fs/aio-nr
  348.  
  349. Note that timeout is relative and will be updated if not NULL and the operation
  350. blocks
  351.  
  352. Check how vectors a provide to vectored PREADV and PWRITEV commands.
  353.  
  354. Other fields to fill/explain:
  355.  
  356. /* these are internal to the kernel/libc. */
  357. __u64 aio_data; /* data to be returned in event's data */
  358. __u32 PADDED(aio_key, aio_reserved1);
  359. /* the kernel sets aio_key to the req # */
  360.  
  361. /* common fields */
  362. +++ __u16 aio_lio_opcode; /* see IOCB_CMD_ above */
  363. __s16 aio_reqprio;
  364. __u32 aio_fildes;
  365.  
  366. __u64 aio_buf;
  367. __u64 aio_nbytes;
  368. __s64 aio_offset;
  369.  
  370. /* extra parameters */
  371. __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */
  372.  
  373. /* flags for the "struct iocb" */
  374. __u32 aio_flags;
  375.  
  376. /*
  377. * if the IOCB_FLAG_RESFD flag of "aio_flags" is set, this is an
  378. * eventfd to signal AIO readiness to
  379. */
  380. __u32 aio_resfd;
  381.  
  382. *** SYNC RELATED COMMANDS ***
  383. IOCB_CMD_FSYNC
  384. sync file's data and metadata with disk; corresponds to fsync() system call.
  385.  
  386. IOCB_CMD_FDSYNC
  387. sync file's data and metadata with disk, but only metadata needed to access
  388. modified file data is written; corresponds to fdatasync() system call.
  389.  
  390.  
  391. *** VECTORED INPUT and OUTPUT ***
  392. IOCB_CMD_PREADV
  393. vectored positioned read, sometimes called "scattered input";
  394. corresponds to pread() system call.
  395.  
  396. IOCB_CMD_PWRITEV
  397. vectored positioned write, sometimes called "gathered output";
  398. corresponds to pwrite() system call.
  399.  
  400. *** OTHER COMMANDS ***
  401. IOCB_CMD_NOOP
  402. defined in the header file, but is not used anywhere else in the kernel.
  403.  
  404. XXX: May be discass Poll and other semi-existing commands here?...
  405.  
  406. *********************************************************
  407. ********************* LIBAIO LIBRARY ********************
  408. *********************************************************
  409.  
  410. libaio:
  411. /lib64/libaio.so.1 (shared library)
  412.  
  413. libaio-devel:
  414. /usr/include/libaio.h (header library)
  415. /usr/lib64/libaio.a (static library)
  416.  
  417. Functions:
  418.  
  419. a) Actual system call wrappers:
  420.  
  421. int io_setup(int maxevents, io_context_t *ctxp);
  422. int io_destroy(io_context_t ctx);
  423. int io_submit(io_context_t ctx, long nr, struct iocb *ios[]);
  424. int io_cancel(io_context_t ctx, struct iocb *iocb, struct io_event *evt);
  425. io_getevents(io_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout);
  426.  
  427. io_context_t is a pointer to an non-existing stucture:
  428.  
  429. typedef struct io_context *io_context_t;
  430.  
  431. Not a single line of code in any user tool or in the libaio library looks at the
  432. members of 'struct io_context'. So, gcc happily compiles the code even though
  433. struct io_context is not defined. This structure is probably defined just for
  434. type checking. The rule of thumb when using libaio is just to declare all
  435. variables as io_context_t and forget that it actually is a pointer!
  436.  
  437. b) Convenient macroses:
  438.  
  439. static inline void io_prep_pread(struct iocb *iocb, int fd, void *buf, size_t count, long long offset)
  440. static inline void io_prep_pwrite(struct iocb *iocb, int fd, void *buf, size_t count, long long offset)
  441. static inline void io_prep_preadv(struct iocb *iocb, int fd, const struct iovec *iov, int iovcnt, long long offset)
  442. static inline void io_prep_pwritev(struct iocb *iocb, int fd, const struct iovec *iov, int iovcnt, long long offset)
  443.  
  444. static inline void io_prep_poll(struct iocb *iocb, int fd, int events)
  445. static inline void io_prep_fsync(struct iocb *iocb, int fd)
  446. static inline void io_prep_fdsync(struct iocb *iocb, int fd)
  447.  
  448. static inline int io_poll(io_context_t ctx, struct iocb *iocb, io_callback_t cb, int fd, int events)
  449. static inline int io_fsync(io_context_t ctx, struct iocb *iocb, io_callback_t cb, int fd)
  450. static inline int io_fdsync(io_context_t ctx, struct iocb *iocb, io_callback_t cb, int fd)
  451.  
  452. static inline void io_set_eventfd(struct iocb *iocb, int eventfd);
  453.  
  454. *********************************************************
  455. ******** MATCHING LIBAIO AND KERNEL INTERFACE ***********
  456. *********************************************************
  457.  
  458. libaio.h redefines some of the kernel definitions (god know why),
  459. but they match at the binary level. E.g., this is kernel
  460. exported definition of iocb:
  461.  
  462. struct iocb {
  463. /* these are internal to the kernel/libc. */
  464. __u64 aio_data; /* data to be returned in event's data */
  465. __u32 PADDED(aio_key, aio_reserved1);
  466. /* the kernel sets aio_key to the req # */
  467.  
  468. /* common fields */
  469. __u16 aio_lio_opcode; /* see IOCB_CMD_ above */
  470. __s16 aio_reqprio;
  471. __u32 aio_fildes;
  472.  
  473. __u64 aio_buf;
  474. __u64 aio_nbytes;
  475. __s64 aio_offset;
  476.  
  477. /* extra parameters */
  478. __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */
  479.  
  480. /* flags for the "struct iocb" */
  481. __u32 aio_flags;
  482.  
  483. /*
  484. * if the IOCB_FLAG_RESFD flag of "aio_flags" is set, this is an
  485. * eventfd to signal AIO readiness to
  486. */
  487. __u32 aio_resfd;
  488. }; /* 64 bytes */
  489.  
  490. And this is definition of iocb by libaio.h:
  491.  
  492. struct io_iocb_common {
  493. PADDEDptr(void *buf, __pad1);
  494. PADDEDul(nbytes, __pad2);
  495. long long offset;
  496. long long __pad3;
  497. unsigned flags;
  498. unsigned resfd;
  499. }; /* result code is the amount read or -'ve errno */
  500.  
  501.  
  502. struct iocb {
  503. PADDEDptr(void *data, __pad1); /* Return in the io completion event */
  504. PADDED(unsigned key, __pad2); /* For use in identifying io requests */
  505.  
  506. short aio_lio_opcode;
  507. short aio_reqprio;
  508. int aio_fildes;
  509.  
  510. union {
  511. struct io_iocb_common c;
  512. struct io_iocb_vector v;
  513. struct io_iocb_poll poll;
  514. struct io_iocb_sockaddr saddr;
  515. } u;
  516. };
  517.  
  518.  
  519.  
  520.  
  521. ****** AIO LIBRARY *****
  522.  
  523. glibc:
  524. /lib64/librt.so.1
  525.  
  526. glibc-headers:
  527. /usr/include/aio.h
  528.  
  529. Provide POSIX-defined interface for async I/O.
  530.  
  531. aio_read()
  532. aio_write()
  533. aio_cancel()
  534. aio_error()
  535. aio_fsync()
  536. aio_suspend()
  537. aio_return()
  538.  
  539. lio_listio
  540.  
  541.  
  542. ****** To discover ****
  543. XXX: see if these are implemented in some other kernels:
  544. /* These two are experimental.
  545. * IOCB_CMD_PREADX = 4,
  546. * IOCB_CMD_POLL = 5,
  547. */
  548. XXX: potential resubmittion of the wrong iocb, knowing its index.
  549. XXX: two AIO contextes per process?
  550. 原文链接:https://www.fsl.cs.sunysb.edu/~vass/linux-aio.txt

原文链接:http://www.cnblogs.com/xuyaowen/p/linux-aio.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号