Additional Readings in Distributed and Cluster Computing
Index
- Background, Internet
- Distributed Systems
- Clusters
- Distributed Communication networking, message passing, RPC, distributed shared memory, etc.
- Network RAM, Network Swapping
- Event Ordering and Distributed State
- Replication and Fault Tolerance and Recovery
- Distributed Coordination
- Naming in Distributed Systems
- Distributed File and Storage Systems
- Distributed Hash Tables
- Peer-to-Peer Systems
- BlockChain, Bitcoin
- Authentication and Security
- The Grid (Metacomputing) and Grid Security
- Web Computing and Security
- Scheduling
- Process Migration, Load Balancing
- Performance Tools
- Distributed Database Management Systems
- Parallel Algorithms
- Supercomputers
- Heterogeneous Systems, Acceleratos, GPGPU
- Misc.
Background
- Brief History fo the Internet
Barry M. Leiner, Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch, Jon Postel, Larry G. Roberts, Stephen Wolff
- Internet: The Big Picture
- Internet Timeline
-
End-To-End Arguments in System Design
J.H. Saltzer, D.P. Reed and D.D. Clark. ACM Transactions on Computer Systems,
4(4):277-288, November 1984
-
Distributed Operating Systems
Andrew S. Tanenbaum and Robbert
Van Renesse; ACM Comput. Surv. 17, 4 (Dec. 1985), Pages 419 - 470.
- A Comparison of Two Distributed Systems: Amoeba and Sprite
Fred Douglas, John K. Ousterhout, M. Frans Kaashoek, Andrew S. Tanenbaum,
-
Specifying graceful degradation in distributed systems,
M. P. Herlihy and J. M. Wing.
Proceedings Sixth ACM Symposium on Principles of Distributed
Computing, Vancouver, British Columbia, Canada, 1987, pages 167-177.
-
Distributed systems
L. Kleinrock.
Communications of the ACM 28(11):1200-1213, November 1985.
-
Design and implementation of a distributed virtual machine for networked
computers
Emin Gun Sirer, Robert Grimm, Arthur J. Gregory, Brian N. Bershad
(University of Washington),
Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP),
Charleston, South Carolina, December, 1999: pp: 202-216
-
Plan 9 From Bell Labs Rob Pike, Dave Presotto, Sean Dorward,
Bob Flandrena, Ken Thompson, Howard Trickey, and Phil Winterbottom,
Computing Systems, Vol 8 #3, Summer 1995, pp. 221-254.
Plan 9 homepage
- Introduction to Parallel Computing and Cluster Computers",
by Dave Turner - Ames Laboratory
Clusters
-
Cluster Computing at a Glance Mark Baker and Rajkumar Buyya,
High Performance Cluster Computing, Volume 1, Chapter 1, Prentice Hall, 1999.
-
High-Performance Computing: Clusters, Constellations, MPPs, and Future
Directions Jack Dongarra, Thomas Sterling, Horst Simon,
and Eric Strohmaier, IEEE Computing Volume 7 No. 2, March/April 2005
-
What's Next in High Performance Computing Gordon Bell, Jim Gray,
Communications of the ACM, Volume 45 Issue 2, February 2002
-
A Case for NOW (Networks of Workstations)
T. Anderson, D. E. Culler, D. A. Patterson, et. al..
-
BEOWULF: A PARALLEL WORKSTATION FOR SCIENTIFIC COMPUTATION
Donald J. Becker, Thomas Sterling, Daniel Savarese, John E. Dorband,
Udaya A. Ranawak, Charles V. Packer,
Proceedings, International Conference on Parallel Processing, 1995
-
Scalable Cluster Computing with MOSIX for Linux
Barak, La'adan, Shiloh,
Proc. Linux Expo '99, pp. 95-100, Raleigh, N.C., May 1999.
-
Introduction to Single System Imaging in Clusters
Bruce J. Walker, Compaq.
-
"GLUnix: A Global Layer Unix for a Network of Workstations",
Douglas P. Ghormley, David Petrou, Steven H. Rodrigues,
Amin M. Vahdat, Thomas E. Anderson,
Software Practice and Experience.
-
"Efficient, Portable, and Robust Extension of Operating System Functionality"
Amin M. Vahdat, Douglas P. Ghormley, and Thomas E. Anderson,
UC Berkeley Technical Report CS-94-842, December, 1994.
- File Systems for Clusters from a Protocol Perspective
Braam, P.J. Second Extreme Linux Topics Workshop Jun. 1999, Monterey CA
-
Resource Aware Cluster Computing
James D. Teresco, Jamal Faik, Joseph E. Flaherty,
IEEE Computing, March/April 2005 (Vol. 7, No. 2)
Distributed Communication
-
The Design Philosophy of the DARPA Internet Procotols
David D. Clark, Proceedings of the 1988 SIGCOMM Symposium, pp 106-114,
Stanford, CA, August 1988.
-
Architectural Considerations for a New Generation of Protocols
D.D. Clark and D.L. Tennenhouse, In Proceedings of the 1990 SIGCOMM
Symposium on Communications Architectures and Protocols, pp. 200-208,
Philadelphia, PA, September 1990.
- A Brief History
of the Internet Barry M. Leiner, Vinton G. Cerf, David D. Clark,
Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch,
Jon Postel, Larry G. Roberts, Stephen Wolff
- Ethernet: Distributed Packet Switching for Local Computer Networks,
Robert M. Metcalfe and David R. Boggs,
Communications of the ACM, Vol. 19, No. 5, July 1976 pp. 395 - 404
-
Masking the Overhead of Protocol Layering
Robbert van Renesse, Proceedings of the 1996 ACM SIGCOMM Conference,
Stanford, September 1996
-
MBone: The Multicast Backbone
H. Eriksson, Communications of the ACM, 37(8):54-60, August 1994.
-
"Building reliable, high-performance communication systems from components",
Xiaoming Liu, Christoph Kreitz, Robbert van Renesse, Jason Hickey, Mark
Hayden, Kenneth Birman, and Robert Constable (Cornell University),
Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP),
Charleston, South Carolina, December, 1999: pp: 80-92
-
"Building TCP/IP Active Messages"
Lok Tin Liu, Alan Mainwaring, Chad Yoshikawa,
Berkeley NOW Project White Paper, 1994.
-
Improving The Performance of Distributed Applications Using Active Networks
Ulana Legedza, David J. Wetherall, and John Guttag
Appears in IEEE INFOCOM'98.
-
Towards an Active Network
D. Tennenhouse and D. Wetherall, ACM SIGCOMM CCR, Vol. 26, No. 2, April 1996.
- "High Level Programming for Distributed Computing", J. A. Feldman,
Communications of the ACM, 22 6, June 1989, pp. 353-368.
- Intro
to Socket Programming From University of Wisconsin
- Davin's collection of unix programming links Lots of links to Network programming
references.
Message Passing Libraries
-
The PVM Concurrent Computing System: Evolution, Experiences, and Trends
V. S. Sunderam, G. A. Geist, J. Dongarra, R. Manchek, ACM Journal of Parallel
Computing, vol. 20 no. 4, 1994
-
A message passing standard for MPP and workstations
J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker,
CACM, 39(7), 1996, pp. 84-90
-
"PVM: A Framework for Parallel Distributed Computing",
V. S. Sunderam,
Concurrency: Practice and Experience, 2, 4, pp 315--339, December, 1990.
-
"A Users' Guide to PVM Parallel Virtual Machine",
A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, and V. S. Sunderam,
Oak Ridge National Laboratory, ORNL/TM-12187, September, 1994
RPC
-
Lightweight Remote Procedure Call
B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M. Levy, ACM
Transactions on Computer Systems 8, 1 (February 1990), 37-55
-
Reflective Remote Method Invocation
George K. Thiruvathukal, Lovely S. Thomas, and Andy T. Korczynski,
ACM Java '98, Stanford University, Palo Alto, CA and Concurrency: Practice
and Experience 1998.
-
Implementing remote procedure calls
Andrew D. Birrel and Bruce Jay Nelson, ACM Transactions on Computer
Systems, 2(1):39-59, February 1984.
-
"Performance of the Firefly RPC"
M. D. Schroeder and M. Burrows,
ACM Trans. on Computer Systems, 8 1, February 1990, pp. 1-17.
some on-line references:
- Whitepaper - RMI
- RMI
Distributed Shared Memory
-
Memory Coherence in Shared Virtual Memory Systems
K. Li and P. Hudak.
ACM Trans. Computer Systems Vol. 7, No. 4. Nov. 1989. pp. 321-359.
-
TreadMarks: Distributed shared memory on standard workstations
and operating systems
A. Cox, S. Dwarkadas, P. Keleher, and W. Zwaenepoel.
Proceedings of the Winter 94 Usenix Conference,
USENIX Assoc., Berkeley, Calif. pp. 115-131
-
"An Evaluation of Software Based Release Consistent Protocols"
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel,
JPDC, 29(2), Sept. 1995, pp 126-141.
-
"Towards Transparent and Efficient Software Distributed Shared Memory"
D.J. Scales and K. Gharachorloo, 16th Symposium on Operating Systems
Principles, Saint Malo, France, October 1997, pp. 157-169.
-
"Cashmere-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network",
R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S.
Parthasarathy, and M. Scott,
16th Symposium on Operating Systems Principles, Saint Malo, France,
October 1997, pp. 170-183.
- Scalable fault-tolerant distributed shared memory,
Florin Sultan, Liviu Iftode, Thu Nguyen,
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
-
OpenMP: An Industry Standard API for Shared Memory Programming,
Leonardo Dagum, Ramesh Menon , IEEE Computing in Science and Engineering,
January-March 1998, Vol. 5, No. 1
- wwwopenmp.org
Distributed Data Structures/Objects
-
The S/Net's Linda Kernel
N. Carriero and D. Gelernter, ACM Trans. on Computer Systems, 4
2, May 1986, pp. 110-129.
-
Network Objects
A. Birrell, G. Nelson, S. Owicki, and E. Wobber (DEC SRC).
In Proceedings of the 14th ACM Symposium on Operating Systems Principles,
pp. 217-230, Asheville, NC, December 1993.
- "Distributed Component Object Model (DCOM) Binary Protocol",
Nat Brown and Charlie Kindel,
Network Working Group Microsoft Corporation, May 1996,
-
"Bringing Distributed Objects to the World Wide Web"
Ron I. Resnick
-
"Generative communication in Linda"
David Gelernter,
ACM Trans. Program. Lang. Syst. 7, 1 (Jan. 1985), Pages 80 - 112
- Some on-line references:
- CORBA meets Java, JavaWorld October 1997
- OMG Homepage
- Distributed Object Computing with CORBA
- Concurrent Programming in Java
Also look for CORBA and JavaSpaces documents on-line
Network RAM
-
Nswap: A Network Swapping Module for Linux Clusters",
Tia Newhall, Sean Finney, Kuzman Ganchev, Michael Spiegel.
In Proceedings of Euro-Par'03 International Conference on
Parallel and Distributed Computing, Klagenfurt, Austria, August 2003.
-
Implementing Global Memory Management in a Workstation Cluster
Michael J. Feeley and William E. Morgan and Frederic H. Pighin, Anna R.
Karlin Henry M. Levy Chandramohan A. Thekkath, 15th ACM Symposium on
Operating Systems Principles, December, 1995.
-
Implementation of a Reliable Remote Memory Pager
Evangelos P. Markatos and George Dramitinos, USENIX 1996 Annual Technical Conference
-
Parallel Network RAM: Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs
John Oleszkiewicz, Li Xiao, Yunhao Liu, 2004 International Conference on
Parallel Processing (ICPP'04), August 2004, pp. 353-360
-
Adaptive memory allocations in clusters to handle unexpectedly large
data-intensive jobs, Li Xiao, Songqing Chen, and Xiaodong Zhang
IEEE Transactions on Parallel and Distributed Systems, Vol. 15, No. 7,
2004, pp. 577-592.
Abstract
-
Availability and Utility of Idle Memory in Workstation Clusters
-
Incorporating Job Migration and Network RAM to Share Cluster
Memory Resources
-
Remote Memory Paging in Networks of Workstations
-
The Network RamDisk: Using Remote Memory on Heterogeneous NOWs
-
Memory Servers for Multicomputers
-
Collaborative Memory Pool in Cluster System,
Nan Wang, Xuhui Liu, Jin He, Jizhong Han, Lisheng Zhang, Zhiyong Xu,
Proceedings of the 2007 IEEE International Conference on Parallel Processing
-
Performance analysis of a user-level memory server, S. Pakin and G. Johnson, Cluster Computing, 2007 IEEE International Conference on Cluster Computing
- Remote Paging references
Event Ordering and Distributed State
-
Time, clocks, and the ordering of events in a distributed system
Leslie Lamport, Communications of the ACM, 21(7):558-565, July 1978.
-
Distributed snapshots: determining global states of distributed systems
K. Mani Chandy and Leslie Lamport; ACM Trans. Comput.
Syst. 3, 1 (Feb. 1985), Pages 63 - 75
-
The Role of Distributed State
John K. Ousterhout, University of California Berkeley
Replication and Fault Tolerance and Recovery
-
"Replicated Distributed Programs", E. C. Cooper,
10th ACM Symposium on Operating Systems Principles (SOSP), Orcas Island, WA,
December 1985 pp. 63-78.
-
"Replication and Fault-Tolerance in the ISIS System", K.P. Birman,
10th Symposium on Operating Systems Principles, Orcas Island, WA,
December 1985
-
"Reliable Communication in the Presence of Failures",
K.P. Birman and T.A. Joseph,
ACM Transactions on Computer Systems, 5 1, February 1987, pp. 47-76.
-
Fundamentatls of Fault-Tolerant Distributed Computing in Asynchronous
Environments , Felix C. Gartner,
Environments", ACM Computing Surveys, 31(1), March 1999
-
Weighted voting for replicated data, D. K. Gifford.
Proceedings Seventh ACM Symposium on Operating Systems Principles,
Pacific Grove, California, December 1979, pages 150-162.
-
Paxos Made Simple Leslie Lamport, November 2001
-
"The Byzantine Generals Problem"
L. Lamport, R. Shostak, and M. Pease,
ACM Transactions on Programming Languages Systems, 4 3, July 1982, pp. 382-401.
- Raft: In Search of an Understandable Consensus Algorithm,
Diego Ongaro and John Ousterhout, Stanford University Tech Report, 2014
-
MicrorebootA Technique for Cheap Recovery,
George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox, Stanford University, OSDI'04
-
Path-Based Failure and Evolution Management, Mike Y. Chen, University of California, Berkeley; Anthony Accardi, Tellme; Emre Kiciman, Stanford University; Dave Patterson, University of California, Berkeley; Armando Fox, Stanford University; Eric Brewer, University of California, Berkeley, NSDI'04
- Simple Testing Can Prevent Most Critical Failures:
An Analysis of Production Failures in Distributed Data-Intensive
Systems, Yuan, Luo, Zhuang, Rodrigues, Zhao, Zhang, Jain,
Stumm, USENIX OSDI'14
Distributed Coordination
-
"Communicating Sequential Processes"
C.A.R. Hoare,
Communications of the ACM 21, 8, August 1978, pp. 666-677.
-
Experiences with Processes and Monitors in Mesa
Butler W. Lampson, David D. Redell,
Communications of the ACM, 23 2, February 1980, pp. 105-117.
-
Scheduler activations: effective kernel support for the user-level management
of parallelism.
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska and Henry M. Levy;
Proceedings of the thirteenth ACM symposium on Operating systems principles,
1991, Pages 95-109
- ZooKeeper: Wait-free coordination for Internet-scale systems,
Hunt, Konar, Junqueira, Reed, in proceedings of
Naming in Distributed Systems
-
Developement of the Domain Name System
Paul V. Mockapetris, Kevin J. Dunlap,
ACM SIGCOMM Computer Communication Review, Volume 25 Issue 1
January 1995
- "Grapevine: An Exercise in Distributed Computing",
Andrew D. Birrell, Roy Levin, Roger M. Needham, Michael D. Schroeder,
Communications of the ACM, 25 4, April 1982, pp. 260-274.
-
Decentralizing a global naming service for improved performance and
fault tolerance
D. R. Cheriton and T. P. Mann.
ACM Transactions on Computer Systems 7(2):147-183, May 1989.
Peer-to-Peer Systems
-
Chord: a scalable peer-to-peer lookup protocol for internet applications
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan,
- Wide-area cooperative storage with CFS,
Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica,
IEEE/ACM Transactions on Networking (TON), Volume 11 Issue 1, February 2003
-
Peer-to-peer: Making gnutella-like P2P systems scalable
Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, Scott Shenker,
Proceedings of the 2003 conference on Applications, technologies,
architectures, and protocols for computer communications, August 2003
- Kademlia: A peer-to-peer information system based on the XOR metric,
Maymounkov and Mazieres, in Proceedings of IPTPS '02
-
Ivy: A Read/Write Peer-to-Peer File System
Athicha Muthitacharoen, Robert Morris, Thomer M. Gil, and Benjie Chen,
Proceedings of OSDI'02, 2002.
-
"A Survey of Peer-to-Peer Storage Techniques for Distributed File Systems",
Ragib Hasan, Zahid Anwar, William Yurcik, Roy Campbell,
IEEE International Conference on Information Technology (ITCC), Las Vegas,
NV, April 2005
-
"Incentives Build Robustness in BitTorrent", Bram Cohen, 2003
- Wide-Area Cooperative
Storage with CFS, Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, SOSP'01
- Storage Management and
Caching in PAST, A Large-scale, Persistent Peer-to-peer Storage Utility,
Antony Rowstron, Peter Druschel, SOSP'01
-
An Analysis of Internet Content Delivery Systems Stefan Saroiu,
Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy,
University of Washington, Proceedings of USENIX OSDI 2002.
-
Technical and social components of peer-to-peer computing:
Extracting guarantees from chaos
John Kubiatowicz, Communications of the ACM, Volume 46 Issue 2 February 2003
-
A Scalable Content-Addressable Network
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker(UCB), ACM SIGCOMM, 2001
-
Scheduling and resource allocation: Samsara: honor among thieves in
peer-to-peer storage
Landon P. Cox, Brian D. Noble,
Proceedings of the nineteenth ACM symposium on Operating systems principles
October 2003
-
Technical and social components of peer-to-peer
computing: Looking up data in P2P systems
Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica,
Communications of the ACM, Volume 46 Issue 2, February 2003
-
Distributed object location in a dynamic network
Kirsten Hildrum, John D. Kubiatowicz, Satish Rao, Ben Y. Zhao,
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms
and architectures, August 2002
-
Finding Good Peers in Peer-to-Peer Networks
Murali Krishna Ramanathan1, Vana Kalogeraki,
Jim Pruyne HP Laboratories Palo Alto
- FAWN: A Fast Array of Wimpy Nodes, Andersen, Franklin, Kaminsky,
Phanishayee, Tan, Vasudevan, in SOSP'09
-
Incentives Build Robustness in BitTorent Bram Cohen, 2003
Distributed File and Storage Systems
-
"Spritely NFS: experiments with cache-consistency protocols",
V. Srinivasan and J. Mogul,
Proceedings of the Twelfth ACM symposium on Operating Systems Principles,
December 3 - 6, 1989, Litchfield Pk., AZ USA
-
"Scalable, Secure, and Highly Available Distributed File Access",
M. Satyanarayanan,
IEEE Computer, May 1990, Vol. 23, No. 5
-
A Case for Redundant Array of Inexpensive Disks (RAID)
David A. Patterson, Garth Gibson, Randy H. Katz,
ACM SIGMOD International Conference on
Management of Data, 1988, pp. 109-116.
-
"Zebra: A Striped Network File System"
John Hartman and John Ousterhout,
In the Proceedings of the USENIX Workshop on File Systems.
-
"The Design and Implementation of a Log-Structured File System",
Mendel Rosenblum and John K. Ousterhout,
Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles ,
1991, Pages 1 - 15
-
"Serverless Network File Systems",
Tom Anderson, Michael Dahlin, Jeanna Neefe, David Patterson,
Drew Roselli, Randy Wang,
15th Symposium on Operating Systems Principles,
ACM Transactions on Computer Systems , 1995.
-
The Google file system
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung,
Proceedings of the nineteenth ACM symposium on Operating systems principles,
October 2003
-
Taming Aggressive Replication in the Pangaea Wide-Area File System,
Yasushi Saito, Christos Karamanolis, Magnus Karlsson, and Mallik Mahalingam, HP Labs, Proceedings of OSDI'02
-
Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs
W. Bolosky, J. Douceur, D. Ely, and M. Theimer.
In SIGMETRICS, pages 34--43, 2000.
-
Scale and Performance in a Distributed File System
J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham,
and M. West., ACM Transactions on Computer Systems, Vol. 6, No. 1,
February 1988, pp. 51-81.
-
The Sun Network Filesystem: Design, Implementation and Experience
Russel Sandberg,
Sun Microsystems, Inc.
-
Design and Implementation of the Sun Network Filesystem
Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, Bob Lyon,
Sun Microsystems, Inc.
- "The LOCUS Distributed Operating System",
Bruce Walker, Gerald Popek, Robert English, Charles Kline, Greg Thiel,
9th Symposium on Operating Systems Principles (SOSP), Bretton Woods, New Hampshire, November 1983, pp. 49-70.
-
Scalable, Secure, and Highly Available Distributed File Access
M. Satyanarayanan
IEEE Computer, May 1990, Vol. 23, No. 5
- Replication in the Harp File System,
Liskov, Ghemawat, Gruber, Johnson, Shrira, Williams,
in proceedings of
- "Storage Management and Caching in PAST, A Large-scale, Persistent
Peer-to-peer Storage Utility",
Antony Rowstron, Peter Druschel, Proceedings of SOSP'01
- Wide-area cooperative storage with CFS,
Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica,
in proceedings of
- Petal: Distributed Virtual Disks, Lee and Thekkath,
Proceedings of ASPLOS'96
- Frangipani: A Scalable Distributed File System, Thekkath, Mann, Le
- Spanner: Google¿s Globally-Distributed Database, in OSDI'12
Distributed Hash Tables
- Scalable, Distributed Data Structures for Internet Service Construction , Steven D. Gribble, Eric A.
Brewer, Joseph M. Hellerstein, David Culler, OSDI'00
- Dynamo: Amazon's Highly Available Key-value Store,
DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin,
Sivasubramanian, Vosshall and Vogels, in Proceedings of SOSP'07
- Bigtable: A Distributed Storage System for Structured Data,
Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, Gruber,
Google, in Proceedings of OSDI'06
- WiscKey: Separating Keys from Values in SSD-conscious Storage,
Lu, Pillai, A. Arpaci-Dusseau, R. Arpaci-Dusseau,
Proceedings of USENIX FAST'16
- TAO:Facebook¿s Distributed Data Store for the Social Graph,
Bronson at el, in USENIX ATC'13
-
Chord: a scalable peer-to-peer lookup protocol for internet applications
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan,
IEEE/ACM Transactions on Networking (TON), Volume 11 Issue 1, February 2003
Distributed Database Management Systems
-
"Transaction Management in the R* Distributed Database Management System"
Mohan, Lindsay, and Obermark,
TODS, 11(4), 1986
-
The Dangers of Replication and a Solution,
Gray, Helland, O'Neil, and Shasha,
Proceedings of the ACM SIGMOD Conference, 1996
-
Mariposa: A Wide-Area Distributed Database , Stonebraker et. al.,
VLDB Journal 5, 1996
-
Client-Server Paradise , D. DeWitt, J. Patel, J. Luo, and J. Yu,
Proceedings of the 1994 VLDB Conference, Chile, August 1994.
Block chain, bitcoin
- Bitcoin: A Peer-to-Peer Electronic Cash System ", Satoshi Nakamoto, 2008
- Ripple distributed consensus algorithm" Ripple whitepaper, 2014
Authentication and Security
-
"The Internet Worm: Crisis and Aftermath"
Eugene H. Spafford, Communications of the ACM, 32 (6): 678-687, June 1989
-
"Encryption and Secure Computer Networks"
Gerald J. Popek, Charles S. Kline,
Computing Surveys, 11 4, December 1979, pp. 331-356.
-
"Kerberos: an Authentication Service for Computer Networks",
B. Clifford Neuman and Theodore Ts'o,
IEEE Communications, 32(9):33-39, September 1994.
-
"New directions in cryptography" Diffie, W.; Hellman, M.,
IEEE Transactions on Information Theory, Volume: 22 , Issue: 6 , Nov 1976
-
"A Logic of Authenication" , M. Burrows, M. Abàadi, and R. Needham,
12th Symposium on Operating Systems Principles, Litchfield Park, AZ,
December 1989, pp. 1-13.
-
Decentralized user authentication in a global file system
Michael Kaminsky, George Savvides, David Mazieres, M. Frans Kaashoek
Proceedings of the nineteenth ACM symposium on Operating systems principles,
October 2003
-
"Using Encryption for Authentication in Large Networks of Computers"
R. M. Needham and M. D. Schroeder,
Communications of the ACM, 21 12, December 1978, pp. 993-999.
-
How SSL Works,
http://developer.netscape.com/tech/security/basics/index.html
-
Exploring RSA Encryption in OpenSSL
Linux Journal, September 25, 2003 by James Tandon
-
Authentication in distributed systems: Theory and practice.
B. Lampson, M. Abadi, M. Burrows, and E. Wobber.
ACM Transcations on Computer Systems 10, 4 (Nov. 1992), pp 265-310.
-
"Kerberos: An authentication service for open network systems."
J. G. Steiner, B. C. Neuman, and J. I. Schiller
In Proceedings of the Winter 1988 Usenix Conference, pages 191-201,
February 1988
- Extensible Security Architectures for Java
Dan S. Wallach, Dirk Balfanz, Drew Dean, and Edward W. Felten. 16th
Symposium on Operating Systems Principles (Saint-Malo, France), October 1997
- Reflections on
Trusting Trust, Ken Thompson, from Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763.
The Grid (Meta-Computing)
-
The Anatomy of the Grid: Enabling Scalable Virtual Organizations,
I. Foster, C. Kesselman, S. Tuecke,
International Journal of Supercomputer Applications, 15(3), 2001.
-
Grids, the TeraGrid, and Beyond,
Daniel Reed, IEEE Computer, January 2003 (Vol. 36, No. 1)
-
Grid Services for Distributed System Integration,
I. Foster, C. Kesselman, J.M. Nick, S. Tuecke,
IEEE Computer, Volume 35, Issue 6, June 2002 pp.37 - 46
- www.globus.org
-
"Globus: A Metacomputing Infrastructure Toolkit",
I. Foster and C. Kesselman,
International Journal of Supercomputer Applications, 11(2):115-128, 1997.
-
"Legion--A View From 50,000 Feet",
Andrew S. Grimsaw and William A. Wulf,
Proceedings of the Fifth IEEE International Symposium on High Performance
Distributed Compuing, IEEE Computer Society Press, Los Alamitos, CA, August 1996
-
Javelin: Internet-Based Parallel Computing Using Java
P. Cappello, B. O. Christiansen, Mihai F. Ionescu, M. O. Neary, K. E. Schauser.
June 20, 1997 ACM Workshop on Java for Science and Engineering Computation, Las Vegas.
-
WebOS: Operating System Services For Wide Area Applications
Amin Vahdat, Thom
as Anderson, Michael Dahlin,
David Culler, Eshwar Belani, Paul Eastham, and Chad Yoshikawa. July 1998.
The Seventh IEEE Symposium on High Performance Distributed Computing.
Meta-computing Security
-
"A Security Architecture for Computation Grids",
I. Foster, C. Kesselman, G. Tsudik and S. Tuecke,
Proc. 5th ACM Conference on Computer and Communications Security Conference, pg. 83-92, 1998.
-
A Flexible Security System for Metacomputing Environments"
Adam Ferrari, Frederick Knabe, Marty Humphrey, Steve Chapin, and Andrew
Grimshaw, Proceedings of HPCN'99 (High-Performance Computing and Networking),
April 1999, Amsterdam, The Netherlands.
- A New Model of Security for Metasystems
Steve Chapin, Chenxi Wang, William Wulf, Fritz Knabe, Andrew Grimshaw,
University of Virginia Technical Report CS-95-34, August 1995
Web Computing
-
"Maintaining Strong Cache Consistency in the World-Wide Web",
C. Liu and P. Cao,
17th International Conf. on Distributed Computing Systems, 1997.
-
"On the scale and performance of cooperative Web proxy caching",
Alec Wolman, Geoffrey M. Voelker, Nitin Sharma, Neal Cardwell,
Anna Karlin, and Henry M. Levy, Proc. of the 17th ACM Symposium on Operating
Systems Principles (SOSP '99), December 1999.
-
"Design Considerations for Integrated Proxy Servers"
S. Sahu, P. Shenoy, and D. Towsley,
Proc. 9th IEEE Int'l. Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV'99), June 1999, pp. 247-250.
-
World-Wide Web Cache Consistency,
Gwertzman, J., Seltzer, M.,
Proceedings of the 1996 Usenix Technical Conference, San Diego, CA January 1996.
- Websys
Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Molly Brown,
Tashana Landray, Denise Pinnel, Anna Karlin, and Henry Levy.
Security and the Web
-
"Weaving a Web of Trust"
Rohit Khare and Adam Rifkin,
(working draft version of the version that appeared
in of the World Wide Web Journal summer 1997(Volume 2, Number 3, Pages 77-112)).
Web File Systems
-
AFS and the Web: Competitors or Collaborators?
M. Satyanarayanan and Mirjana Spasojevic.
Proceedings of the Seventh ACM SIGOPS European Workshop, Connemara, Ireland
September 1996
-
WebFS: A Global Cache Coherent Filesystem
Amin Vahdat, Paul Eastham,
and Thomas Anderson. December 1996. Technical Draft.
Computer Science Division, University of California Berkely
-
WebNFS: Filesystem for the Internet
Brent Callaghan
Sun Microsystems, Inc. technical report, April 1997
Scheduling
-
"The Interaction of Parallel and Sequential Workloads on a Network of
Workstations"
Remzi H. Arpaci, Andrea C. Dusseau, Amin M. Vahdat, Lok T. Liu,
Thomas E. Anderson, and David A. Patterson, SIGMETRICS 1995
-
"Scheduling with Implicit Information in Distributed Systems"
Andrea C. Arpaci-Dusseau, David E. Culler, Alan Mainwaring,
Sigmetrics'98 Conference on the Measurement and Modeling of Computer Systems
-
"A closer look at coscheduling approaches for a network of workstations"
Shailabh Nagar, Ajit Banerjee, Anand Sivasubramaniam and Chita R. Das,
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and
architectures , 1999, Pages 96 - 105
-
Gang Scheduling
-
Gang Scheduling on Clusters
-
Self Scheduling in Clusters
-
Task Scheduling on Clusters
-
Workload Managment, more than just job scheduling
-
Adaptive Scheduling on Clusters
Process Migration, Load Balancing
-
Process Migration Milojicic, Douglis, Paindaveine, Wheeler, Zhou 1999.
-
Checkpoint and Migration of UNIX Processes in the Condor Distributed
Processing System, Michael Litzkow, Todd Tannenbaum, Jim Basney, Miron Livny
Computer Sciences Technical Report #1346, University of Wisconsin-Madison,
April 1997
-
-
"Deploying a High Throughput Computing Cluster"
,
Jim Basney and Miron Livny,
High Performance Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May 1999.
-
"Process Migration in DEMOS/MP",
M.L. Powell and B.P. Miller,
9th Symposium on Operating Systems Principles, Bretton Woods, NH, October 1983, pp. 110-119.
-
"The Kangaroo Approach to Data Movement on the Grid",
D. Thain, J. Basney, S.-C. Son, and M. Livny,
Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10),
San Francisco, California, August 7-9, 2001
- Exploiting idle periods in Clusters
- References
to online documents about process migration, checkpointing and load balancing
Performance Tools
-
"The Paradyn Parallel Performance Measurement Tools",
B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth,
R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall,
IEEE Computer, Nov. 1995. 28(11), pp. 37-46.
-
"Scalable Performance Analysis: The Pablo Performance Analysis Environment"
D. A. Reed, R. A. Aydt, R. J. Noe, P. C. Roth, K. A. Shields, B. W. Schwartz, an
d L. F. Tavera,
Proceedings of the Scalable Parallel Libraries Conference,
A. Skjellum, Editor. 1993, IEEE Computer Society.
Parallel Algorithms
Parallel Algorithms for Regular Architectures by
Russ Miller Quentin F. Stout, MIT Press, 1996.
-
An Overview of the BlueGene/L Supercomputer, The BlueGeneL/Team,
Proceedings of the IEEE SC2002, 2002.
- Top 500 list
GPUs
- gpgpu.org
-
Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability, Boyer, Skadron, Che, Jayasena, 2013
Misc.
Does Systems Research Measure Up?
Small, C., Ghosh, N., Saleeb, H., Seltzer, M., Smith, K.,
Harvard University Computer Science Technical Report
TR-16-97, November 1997.