First Workshop on Network Coding and Data Storage

Hong Kong, July 21-22, 2011


Program
Day 1 (July 21)
8:45-9:00 Opening by John Lui and Raymond Yeung, Workshop Co-Chairs video
AM Session Chair: Raymond Yeung
9:00-10:15 Kannan Ramchandran, University of California, Berkeley
Network Codes for Next-Generation Distributed Storage Systems: Opportunities and Challenges
slides
video
10:15-10:45 Tea Break
10:45-12:45 Alex Dimakis, University of Southern California
Tutorial on Distributed Storage Problems and Regenerating Codes
slides
video
12:45-14:30 Lunch
PM Session Chair: Raymond Yeung
14:30-15:45 Patrick Lee, The Chinese University of Hong Kong
Design and Implementation of a Network-Coding-Based Distributed File System
slides
video 1 video 2
15:45-16:15 Tea Break
16:15-17:30 Patrick Lee, The Chinese University of Hong Kong
Design and Implementation of a Network-Coding-Based Distributed File System (cont.)
video 1 video 2
Day 2 (July 22)
AM Session Chair: Alex Dimakis
8:45-9:45 P. Vijay Kumar, Indian Institute of Science
Efficient Code Constructions for Reliable Distributed Storage
slides
video 1 video 2 video 3 video 4
9:45-10:45 Yinlong Xu, University of Science and Technology of China
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation
slides
video 1 video 2 video 3
10:45-11:15 Tea Break
11:15-12:15 Kenneth Shum, The Chinese University of Hong Kong
Cooperative Regenerating Codes for Distributed Storage Systems
slides
video 1 video 2
12:15-14:00 Lunch
PM Session Chair: Kannan Ramchandran
14:00-15:00 Salim El Rouayheb, University of California, Berkeley
Data Security in Distributed Storage Systems
slides
video
15:00-16:00 James She, Hong Kong University of Science and Technology
Making Network Coding Social For Better Data Storage and Content Delivery
slides
video 1 video 2 video 3
16:00-16:30 Tea Break
16:30-17:30 Yuchong Hu, The Chinese University of Hong Kong
The Scaling Problem for Cloud Storage
slides
video 1 video 2
Closing by Raymond Yeung video


Abstracts of Talks

Alex Dimakis (University of Southern California)

Title: Tutorial on Distributed Storage Problems and Regenerating Codes

Abstract: We survey the recent developing theory on regenerating codes. Both lower via information theoretic cut-set arguments and constructive techniques will be developed in a tutorial way. We will also discuss several open directions including optimization of storage systems with different cost metrics, minimum disk IO and optimized storage allocation problems.


Kannan Ramchandran (University of California, Berkeley)

Title: Network Codes for Next-Generation Distributed Storage Systems: Opportunities and Challenges

Abstract: Today’s explosive growth in data centers and cloud storage is driving the critical need for gracefully scaling and robustly maintaining these systems in order to avoid the kind of meltdowns such as those recently experienced by Amazon and others. Such meltdowns may be inevitable if we continue to deal with system growth and node failures as an afterthought rather than incorporating them upfront in system design. It is desirable for next-generation massively scalable and robust systems to include, in addition to centrally-managed “cloud” components, a mix of on-the-fly deploy-as-needed organically grown network nodes that can help sustain scale and ensure robustness to node churn and failures. In this talk, we will overview some of the key system challenges and opportunities, highlight the critical role of network coding, and summarize our recent and ongoing work in this area.


Yinlong Xu (University of Science and Technology of China)

Title: A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation

Abstract: The current parallel storage systems use thousands of inexpensive disks to meet the storage requirement of applications. Data redundancy and/or coding are used to enhance data availability, e.g., Row-diagonal parity (RDP) and EVENODD codes, which are widely used in RAID-6 storage systems, provide data availability with up to two disk failures. To reduce the probability of data unavailability, whenever a single disk fails, disk recovery will be carried out. We find that the conventional recovery schemes of RDP and EVENODD codes for a single failed disk only use one parity disk. However, there are two parity disks in the system, and both can be used for single disk failure recovery. In this talk, we will introduce a hybrid recovery approach which uses both parities for single disk failure recovery and design efficient recovery schemes for RDP code (RDOR-RDP) and EVENODD code (RDOR-EVENODD). RDOR-RDP and RDOR-EVENODD have the following attractive properties: (1) read optimality in the sense that they issues the smallest number of disk reads to recover a single failed disk and it reduces approximately 1/4 of disk reads compared with conventional schemes; (2) load balancing property in that all surviving disks will be subjected to the same (or almost the same) amount of additional workload in rebuilding the failed disk. Experimental results show the advantage of RDOR-RDP and RDOR-EVENODD in distributed storage.


P. Vijay Kumar (Indian Institute of Science)

Title: Efficient Code Constructions for Reliable Distributed Storage

Abstract: In a distributed storage (DS) system, information is dispersed across nodes in a network in such a manner that an end-user can retrieve the data stored by tapping into a subset of the nodes. When compared with data replication, erasure codes such as RS codes provide increased resiliency in the face of node failures.

Regenerating codes, introduced by Dimakis et. al. go one step beyond erasure codes by minimizing in addition, the amount of data that has to be downloaded to bring up a failed node. The first part of the talk will provide an overview of some optimal constructions for regenerating codes. The second part will describe a recent simple, yet promising framework for the construction of a DS code. Under this framework, the DS code can be built up of any pair of erasure codes and further, data download or node repair is accomplished simply through erasure decoding of the constituent codes.


Kenneth Shum (The Chinese University of Hong Kong)

Title: Cooperative Regenerating Codes for Distributed Storage Systems

Abstract: In large distributed storage systems, two or more storage nodes may fail simultaneously. The problem of repairing multiple failures is addressed in this talk. The cooperative recovery process is divided into two phases. In the first phase, the new storage nodes to be regenerated first download some data packets from the surviving storage nodes. In the second phase, some data packets are exchanged among them. In compare to single-loss-recovery, the number of packet transmissions in the recovery process can be reduced. Some explicit constructions for cooperative regenerating codes will be presented.


Pak-Ching (Patrick) Lee (The Chinese University of Hong Kong)

Title: Design and Implementation of a Network-Coding-Based Distributed File System

Abstract: An emerging application of network coding is to improve the robustness of distributed storage. Recent work has shown that regenerating codes, which are based on the concept of network coding, can improve the data repair performance when some storage nodes are failed, as compared to traditional storage schemes such as erasure coding. However, there remain open issues regarding the feasibility of deploying regenerating codes in practice.

In this tutorial, we will study the design, implementation, and experimentation of a network-coding-based distributed file system that can be deployed in practice. We consider a file system that is built on FUSE, an open-source, user-space framework for developing different file system operations. We will discuss how to implement different storage schemes based on erasure codes and regenerating codes, and integrate them into a FUSE-based file system prototype. We will also address some technical considerations and challenges in implementation and real deployment.


Yuchong Hu (The Chinese University of Hong Kong)

Title: The Scaling Problem for Cloud Storage

Abstract: In cloud storage, consumers can outsource the storage of data to cloud vendors and purchase storage resources for dynamic demand. The dynamic demand refers to the scaling problem for cloud storage, which is addressed in this talk.

I will share my observations about different cases of scaling problem from MDS (n,k) to MDS (n+m,k) and MDS (n+m,k+m). I will discuss some of the results I’ve got: the lower bound of bandwidth cost for scaling and corresponding algorithms to match the bound.


Salim El Rouayheb (University of California, Berkeley)

Title: Data Security in Distributed Storage Systems

Abstract: With the growing trend of migrating storage and computation to the cloud, data security in distributed storage systems is becoming a prime concern. These systems pose new theoretical challenges due to the dynamics created by nodes constantly leaving and joining the system. For instance, one malicious "elder" node may corrupt the whole system by always "teaching" younger nodes wrong lessons (i.e., sending them corrupt data).

How much data can we store securely in distributed systems? How about efficient codes and algorithms that guarantee security? Is there a way to catch the bad nodes? And, in general, what is the effect of time on security? Distinguishing between passive (eavesdropping) and active (malicious) intruders, I will present partial answers to the questions above from a fundamental information-theoretic perspective and highlight many interesting open problems.


James She (Hong Kong University of Science and Technology)

Title: Making Network Coding Social For Better Data Storage and Content Delivery

Abstract: A discussion and some insights on exploring how social networking can bring network coding more effective for better data storage and content delivery in today and future wired/wireless Internet that is loaded with social media.