ZT:冯丹的博士论文

jsj 发表于 2004/10/30 12:27 华中科技大学校友论坛 (www.hust.org)

加跟贴 发新贴

冯丹

 

论文题目:外存储系统并行性研究

作者简介:冯丹,女,1970年生,1994年师从华中理工大学张江陵教授,于1997年获博士学位。

摘 要

随着信息时代的到来,人们对于信息存储的容量和速度要求越来越高。多媒体应用要求大容量快速存储系统支持,以存储符合MPEG-2标准的电影为例,一部两小时的电影大约需要近5GB的存储空间和10Mbps的数据传输率。多用户事务处理环境要求快速I/O支持实时访问,而对于某些重大挑战性科学计算课题更是追求计算机系统具有3T性能目标,即要求能提供1 Teraflops计算能力、1 Terabyte主存容量和1 Terabyte/s I/O带宽。D.A. Patterson认为目前计算机系统的性能应从传统的追求CPU计算能力(MIPS)的提高转向追求整体性能提高,特别是要提高系统I/O能力。Amdahl法则表明,计算机系统的性能受限于系统中最慢的部件。若CPU速度提高100倍,而I/O速度变化不大,则计算机性能仅提高10倍,即只利用了CPU的速度提高的10%,其余被慢速的I/O系统浪费掉了,也就是I/O已成为影响计算机性能的“瓶颈”。解决I/O“瓶颈”问题的最有效途径之一是采用并行存取技术,特别是实现存储设备的底层I/O操作并行性。

并行I/O方面的研究在国际上极受重视,是当前计算机研究领域的重点。本文以构造高性能存储系统为目标,对计算机外存储系统并行性从各个角度进行了深入研究。其创造性的成果主要有如下几个方面。

(1) 首次建立了并行存储系统通信机制分析模型(磁盘阵列的随机Petri网模型),为并行存储系统的设计提供了理论依据。

运用并行存储系统通信机制分析模型分析阵列控制器与串控制器的通信机制对系统性能的影响。指出:串控制器平均利用率与通信机制选择有关(包括通信总线的选择、通信方式的选择等);采用好的通信机制有利于提高并行存储系统的性能。采用此模型分析计算了RAID-5磁盘驱动器利用率,发现在小写情况下的磁盘驱动器利用率远远低于大块I/O请求下的利用率,由此得出在处理盘阵列中I/O调度时应尽量避免小写,其解决办法是采用阵列buffer将多次小写聚集为一次大写。

(2) 将操作系统多线程概念移植到存储系统底层,用此概念对单通道的多存储设备进行并行处理。首次构造出使用多线程I/O技术的磁盘阵列。

SCSI总线上的I/O设备可以在进行慢速I/O操作时与CPU暂时断开连接,当完成I/O操作后可与CPU重建连接,即具备“失连/再选”的硬件功能。利用这一功能,将一次I/O操作划分为多个操作阶段,分别对应于多个线程,创造性地在存储系统底层实现了SCSI总线上多个存储设备的多线程I/O调度。它使多个磁盘驱动器得以并行方式存取,大幅度提高了存取速度。论文分析了多线程调度的效果,指出:当多线程调度的I/O个数较少时,总的I/O执行时间与单个I/O服务时间近似相等;当调度的I/O个数较多(<10)时,I/O服务时间可隐藏在SCSI总线时延和数据传输中。从而可充分利用SCSI通道,降低总的I/O响应时间。

(3) 提出了一种基于数据分块存取的纠双磁盘错的EVENODD码。

研究纠多磁盘错的纠错码,主要是为了提高并行存储系统的可靠性和可用性。但存取纠错信息必然会带来冗余的I/O操作,增加I/O并行调度的开销。论文分析比较了三种适用于磁盘阵列的纠双错码,二维奇偶校验、RS(Reed-Solomon)码及EVENODD码,的编码、译码复杂度及其小写情况下I/O调度特性。指出EVENODD编、译码较简单,但小写性能较差,不利于I/O并行调度,平均需执行3次以上I/O并行调度。若将盘阵列中分块技术与编码结合,采用作者提出的EVENODD码改进方案,则在不增加编、译码复杂度的情况下,小写性能最优,它只需执行两次I/O并行调度,系统开销少。

(4) 提出了多种降低I/O并行开销的新方法,极大地提高了系统存取速度。

分析了多串SCSI存储设备并行操作的可行性及时间特性,建立了各种磁盘阵列结构的磁盘服务排队模型,计算分析了RAID-0、RAID-1、RAID-5的I/O响应时间及最大吞吐量,并在此基础上推导了一个典型盘阵列系统的I/O响应时间及最大吞吐量计算式。提出了多种降低I/O并行开销的新方法,如:减少串控制器取指时间,降低多串存储设备I/O并行开销,使得串间并行开销小于0.5ms;结合通道结构实现硬件容错,从而加快系统响应速度等。这些新方法反映在所提出的“一种盘阵列系统集成方法”专利和所构造出的具有自主版权的磁盘阵列产品中。盘阵列平均数传率可达15MB/s, 平均访问时间为0.3ms。

 

以上创造性的成果除在学位论文中有详细描述外,已公开发表在IEEE杂志、日本磁学会刊、《电子学报》等国际、国内刊物上。作者在攻读博士学位期间及获得博士学位后一年内发表学术论文14篇,其中被SCI、EI、ISTP三大检索收录10篇次。由于提出改进EVENODD码,1997年受德国资助参加IEEE国际学术会议。

本论文研究工作受国家自然科学基金重点项目“快速、超高密度外存储基础技术研究”(69133020)和国家自然科学基金项目“盘阵列的自适应并行I/O系统结构的研究”(69503003)资助。这两个项目分别于1996年和1998年通过国家基金委验收,综合评价为“A”。作者参与完成的“快速超高密度存储技术的理论与试验研究”项目获1998年教育部科技进步一等奖(本人排名第三,前两名为指导和协助指导的教师)。此项目又于1999年获国家自然科学四等奖(排名第三)。所开发的磁盘阵列产品将由武汉市东湖新技术开发区组织生产,投放市场。

关键词:存储系统,并行存取,SCSI接口,磁盘阵列,Petri网,I/O响应时间,纠错编码

Abstract*

With the arriving of the information age, the demands of the storage capacity and speed are becoming greater and greater. The multimedia applications require the storage system with large capacity and high speed. To store a movie for two hours in the standard of MPEG-2, it needs nearly 5GB storage capacity and 10Mbps data transfer rate. The application of multi-user requires high speed I/O to support accessing in real time. For some science computing objects, it requires that the computer system has 3T performance, such as 1 Teraflops computing ability, 1 Terabyte memory and 1 Terabyte/s I/O bandwidth. D.A Patterson suggested that it should not only increase the computing capability but also the whole computer system performance, especially the I/O capability. Amdahl rule shows that the enhancing of the computer performance is restricted by the lowest speed unit in the system. If the capability of CPU is increased up to 100 times, but the speed of I/O is kept slow, the system performance is only increased 10 times. I/O system has been the bottleneck of the all computer systems. Parallel accessing is one of the efficient ways to solve this I/O bottleneck problem. The key technology is the implementation of the low I/O level parallel operations.

The study of parallel I/O is one of the major research areas of computer system. To construct the high performance parallel storage system, the parallelism of storage devices is studied in detail in this dissertation. The main achievements are as follows.

1. Analysis model (Stochastic Petri net model of parallel storage system) is set up for the first time to analyze the communication mechanism of the parallel storage system. It provides theoretic foundation for the design of parallel storage system.

Using the communication mechanism analysis model of the parallel storage system, the performance of different communication mechanisms between disk array controller and the string controllers is analyzed. From the study can we see that the average utilization of string controller has the relationship with the choice of the communication mechanism (including the choice of the communication bus and the choice of the communication method). Proper communication mechanism can greatly improve the performance of parallel storage system. By using this model to calculate the utilization of disk drives for RAID level 5, we find that the disk drive utilization in the situation of small write is much smaller than that of large block I/O request. Hence it should be avoided by using small write in the I/O scheduling of disk array. Buffering, which combines many small writes to a large write, is an efficient way to avoid this problem.

2. The multithread concept in the operation system is transplanted to the low level of the storage system. It is used to process the multiple storage devices among a string in parallel. The multithread I/O technology is adopted to construct a disk array for the first time.

The I/O device on a SCSI bus can disconnect with CPU temporarily when it does some slow I/O operations. After it finished the I/O operation, it will reconnect with CPU. SCSI device has the function of “Disconnect/Reselect” with initiator. Using the function, an I/O operation can be divided into some phases that are corresponding with multiple threads. Multithread I/O scheduling of multiple storage devices within a SCSI string is implemented. It makes the multiple disk drives do I/O in parallel and enhances the access speed of the system greatly. After analyzing the efficiency of the multithread scheduling by considering the I/O characteristic, we find that the performance of multithread scheduling is the best by adopting the high speed SCSI bus. When the number of I/O in the multithread scheduling is small, the overall I/O service time is nearly the same as the single I/O one. When the number of I/O in the multithread scheduling is large, I/O service time is hidden in the delay of the SCSI bus and the time for data transferring, the SCSI channel can be fully used, and the I/O response time is reduced.

3. Improved EVENODD code based on data striping for tolerating double disk failures is proposed.

The purpose of the study on the error correcting codes for tolerating double disk failures is to enhance the reliability and availability of the storage system. It will bring some redundant I/O operations to access the error correcting information and will increase the cost of the parallel I/O scheduling. After comparing the calculation complexity and the characteristic of I/O scheduling in the situation of small write of three different codes for tolerating double disk failures in disk array, which are two dimensional parity, RS (Reed-Solomon) code, and EVENODD code, we can see that for the EVENODD code, the encoding and the decoding procedure are simple, but the performance of small write is poor, it is unsuitable for parallel scheduling. It needs more than three times of parallel I/O scheduling time. By combining the technique of striping in disk array, a modified EVENODD code scheme is proposed in the dissertation. Using the scheme, the performance of small write is optimal without increasing the complexity of encoding and decoding. It only needs two times of parallel I/O scheduling time, the cost of scheduling is small.

4. Some new methods to reduce the cost of parallel I/O scheduling are proposed. With the methods, the system accessing speed is increased greatly.

The feasibility and the time characteristic of the parallel operation of SCSI storage devices among multiple strings are analyzed. Disk service queue models for the different architecture of disk arrays are set up. I/O response time and the maximum throughput of RAID level 0, RAID level 1, RAID level 5 are calculated and analyzed in detail. Based on this, formulas of the I/O response time and the maximum throughput of a typical disk array are conducted. A lot of new methods to reduce the cost of parallel I/O scheduling are proposed. For example, through reducing the time of fetching instructions, the cost of parallel I/O scheduling among multiple strings is reduced to less than 0.5ms. Those new methods are reflected on the patent, which is a system integration way to construct disk array, and have been used to construct our disk array products. The average data transfer rate of the disk array is 15MB/s and the average access time is 0.3ms.

 

The achievements are described in detail in the dissertation and published on the IEEE Transaction on Magnetics, Journal of the Magnetics Society of Japan and so on. Fourteen papers are published and ten of them are contained in SCI, EI and ISTP. Because of the proposal of modified EVENODD code, the author was gotten the support from Germany to attend the IEEE Conference on Information Theory in 1997.

This research work is supported by the subjects of National Science Foundation of China that the grant numbers are 69133020 and 69503003. These two subjects are finished in 1996 and 1998 respectively, and the evaluation is excellent when the National Science Foundation checked and accepted them finally. The subject “Theoretical and Practical Research on the High Speed and Ultra High Density of Magnetic Storage Technology” got the first class award of the National Department of Education in 1998 and the fourth class award of the National Natural Science. The author is the third finisher in the subject. The disk array developed by us will be organized to produce and sell by the New Technology Development of East Lake of Wuhan.

Keywords: Storage System, Parallel Access, SCSI, Disk Array, Petri Net, I/O Response Time, Error Correcting Code


本主题前一文章

ZT:华中科技大学34岁的973首席女科学家:2800万的科技盛宴 --- jsj 2004/10/30 12:03 (2551 bytes)

加跟贴 发新贴校友论坛索引首页

Powered by AFpost Thu Dec 27 23:23:28 2018.