码迷,mamicode.com
首页 > 移动开发 > 详细

2-Application Layer

时间:2016-07-19 13:53:01      阅读:391      评论:0      收藏:0      [点我收藏+]

标签:

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1

2.1 Principles of Network Applications

2.1.1 Network Application Architectures

  • Application’s architecture is different from the network architecture (e.g., the five-layer Internet architecture):
    1. The application architecture is designed by the application developer and dictates how the application is structured over the various end systems. Two main architectures used in applications: the client-server architecture or the peer-to-peer (P2P) architecture.
    2. The network architecture is fixed and provides a specific set of services to applications.
  • In a client-server architecture, there is an always-on host(server), which services requests from many other hosts(clients).
  • In a P2P architecture, there is minimal (or no) reliance on dedicated servers in data centers. Instead the application exploits direct communication between pairs of intermittently connected hosts, called peers. The peers are not owned by the service provider, but are instead desktops and laptops controlled by users, with most of the peers residing in homes. Because the peers communicate without passing through a dedicated server, the architecture is called peer-to-peer.

2.1.2 Processes Communicating

  • A network application consists of pairs of processes that send messages to each other over a network. For each pair of communicating processes, we label one as the client and the other as the server. The process that initiates the communication (that is, initially contacts the other process at the beginning of the session) is labeled as the client. The process that waits to be contacted to begin the session is the server.
  • A process sends messages into, and receives messages from, the network through a software interface called a socket. A socket is the interface between the application layer and the transport layer within a host which is also referred to as the Application Programming Interface (API) between the application and the network.
    技术分享
  • The application developer has control of everything on the application-layer side of the socket but has little control of the transport-layer side of the socket. The only control that the application developer has on the transport-layer side is
    (1) the choice of transport protocol and
    (2) the ability to fix a few transport-layer parameters such as maximum buffer and maximum segment sizes.
    Once the application developer chooses a transport protocol, the application is built using the transport-layer services provided by that protocol.
  • In order for a process running on one host to send packets to a process running on another host, the receiving process needs to have an address. To identify the receiving process:
    (1) the address of the host;
    (2) an identifier that specifies the receiving process in the destination host.
  • The host is identified by its IP address that is a 32-bit quantity.
    The receiving process is identified by a destination port number.
  • Popular applications have been assigned specific port numbers. For example, a Web server is identified by port number 80. A mail server process (using the SMTP protocol) is identified by port number 25.

2.1.3 Transport Services Available to Applications

  • Services that a transport-layer protocol can offer to applications invoking: reliable data transfer, throughput, timing, and security.
  • When a transport protocol provides process-to-process reliable data transfer, the sending process can pass its data into the socket and know with confidence that the data will arrive without errors at the receiving process.
  • Available throughput between two processes in network is the rate at which the sending process can deliver bits to the receiving process. Transport-layer protocol could provide guaranteed available throughput at some specified rate. With such a service, the application could request a guaranteed throughput of r bits/sec, and the transport protocol would then ensure that the available throughput is always at least r bits/sec.
  • Applications that have throughput requirements are said to be bandwidth-sensitive applications, otherwise are elastic applications.
  • Timing guarantees can come in many forms. E.g.: every bit that the sender pumps into the socket arrives at the receiver’s socket no more than 100 msec later.
  • A transport protocol can provide an application with one or more security services. For example, in the sending host, a transport protocol can encrypt all data transmitted by the sending process, and in the receiving host, the transport-layer protocol can decrypt the data before delivering the data to the receiving process.

2.1.4 Transport Services Provided by the Internet

  • The Internet (TCP/IP networks) makes two transport protocols available to applications, UDP and TCP.
    TCP Services
  • The TCP service model includes a connection-oriented service and a reliable data transfer service.
    1. Connection-oriented service. TCP has the client and server exchange transport-layer control information with each other before the application-level messages begin to flow. This so-called handshaking procedure alerts the client and server, allowing them to prepare for packets stream. After the handshaking phase, a TCP connection is said to exist between the sockets of the two processes. The connection is a full-duplex connection in that the two processes can send messages to each other over the connection at the same time. When the application finishes sending messages, it must tear down the connection.
    2. Reliable data transfer service. The communicating processes can rely on TCP to deliver all data sent without error and in the proper order.
  • The TCP congestion-control mechanism throttles a sending process (client or server) when the network is congested between sender and receiver. TCP congestion control also attempts to limit each TCP connection to its fair share of network bandwidth.
    UDP Services
  • UDP is connectionless, so there is no handshaking before the two processes start to communicate.
  • UDP provides an unreliable data transfer service: when a process sends a message into a UDP socket, UDP provides no guarantee that the message will ever reach the receiving process.
  • UDP does not include a congestion-control mechanism, so the sending side of UDP can pump data into the layer below (the network layer) at any rate it pleases.

2.1.5 Application-Layer Protocols

  • An application-layer protocol defines:
    1. The types of messages exchanged, for example, request messages and response messages.
    2. The syntax of the various message types, such as the fields in the message and how the fields are delineated.
    3. The semantics of the fields, that is, the meaning of the information in the fields.
    4. Rules for determining when and how a process sends messages and responds to messages.
  • Some application-layer protocols are specified in RFCs and are therefore in the public domain. For example, the Web’s application-layer protocol, HTTP is available as an RFC. If a browser developer follows the rules of the HTTP RFC, the browser will be able to retrieve Web pages from any Web server that has also followed the rules of the HTTP RFC. Many other application-layer protocols are proprietary and intentionally not available in the public domain.
  • An application-layer protocol is only one piece of a network application. E.g.: The Web application consists of many components, including a standard for document formats (HTML), Web browsers, Web servers, and an application-layer protocol. The Web’s application-layer protocol, HTTP, defines the format and sequence of messages exchanged between browser and Web server. Thus, HTTP is only one piece of the Web application.

2.1.6 Network Applications Covered in This Book

2.2 The Web and HTTP

2.2.1 Overview of HTTP

  • HTTP is implemented in two programs: a client program and a server program. The client program and server program, executing on different end systems, talk to each other by exchanging HTTP messages. HTTP defines the structure of these messages and how the client and server exchange the messages.
  • A Web page consists of objects. An object is a file—such as an HTML file, a JPEG image—that is addressable by a single URL. Most Web pages consist of a base HTML file and several referenced objects. For example, if a Web page contains HTML text and five JPEG images, then the Web page has six objects.
  • Each URL has two components: the hostname of the server that houses the object and the object’s path name. For example, the URL: http://www.someSchool.edu/someDepartment/picture.gif has www.someSchool.edu for a hostname and /someDepartment/picture.gif for a path name.
  • HTTP defines how Web clients request Web pages from Web servers and how servers transfer Web pages to clients. When a user requests a Web page, the browser sends HTTP request messages for the objects in the page to the server. The server receives the requests and responds with HTTP response messages that contain the objects.
  • HTTP uses TCP as its underlying transport protocol. The HTTP client first initiates a TCP connection with the server. After the connection is established, the browser and the server processes access TCP through their socket interfaces. The client sends HTTP request messages into its socket interface and receives HTTP response messages from its socket interface.
  • The server sends requested files to clients without storing any state information about the client. If a particular client asks for the same object twice in a period of a few seconds, the server resends the object. Because an HTTP server maintains no information about the clients, HTTP is said to be a stateless protocol.

2.2.2 Non-Persistent and Persistent Connections

  • In Internet applications, the client and server communicate for an period of time, with the client making a series of requests and the server responding to each of the requests. The series of requests may be made back-to-back, periodically at regular intervals, or intermittently. When this client-server interaction is taking place over TCP, the application developer needs to decide that should each request/response pair be sent over a separate TCP connection, or should all of the requests and their corresponding responses be sent over the same TCP connection? In the former approach, the application is said to use non-persistent connections; and in the latter approach, persistent connections.
  • HTTP uses persistent connections in its default mode, HTTP clients and servers can be configured to use non-persistent connections instead.
    HTTP with Non-Persistent Connections
  • Suppose the page consists of a base HTML file and 10 JPEG images, and that all 11 of these objects reside on the same server. The URL for the base HTML file is http://www.someSchool.edu/someDepartment/home.index
    1. The HTTP client process initiates a TCP connection to the server www.someSchool.edu on port number 80, which is the default port number for HTTP. Associated with the TCP connection, there will be a socket at the client and a socket at the server.
    2. The HTTP client sends an HTTP request message to the server via its socket. The request message includes the path name /someDepartment/home.index.
    3. The HTTP server process receives the request message via its socket, retrieves the object /someDepartment/home.index from its storage, encapsulates the object in an HTTP response message, and sends the response message to the client via its socket.
    4. The HTTP server process tells TCP to close the TCP connection. But TCP doesn’t actually terminate the connection until it knows for sure that the client has received the response message intact.
    5. The HTTP client receives the response message. The TCP connection terminates. The message indicates that the encapsulated object is an HTML file. The client extracts the file from the response message, examines the HTML file, and finds references to the 10 JPEG objects.
    6. The first four steps are then repeated for each of the referenced JPEG objects.
  • Each TCP connection transports exactly one request message and one response message. In this example, when a user requests the Web page, 11 TCP connections are generated.
  • Round-Trip Time (RTT) is the time it takes for a small packet to travel from client to server and then back to the client which includes packet-propagation delays, packet-queuing delays in intermediate routers and switches, and packet-processing delays.
    技术分享
  • When a user clicks on a hyperlink, the browser initiates a TCP connection between the browser and the Web server which involves a “three-way handshake”.
  • The client sends a small TCP segment to the server, the server acknowledges and responds with a small TCP segment, and, finally, the client acknowledges back to the server. The first two parts of the three-way handshake take one RTT.
  • After completing the first two parts of the handshake, the client sends the HTTP request message combined with the third part of the three-way handshake (the acknowledgment) into the TCP connection. Once the request message arrives at the server, the server sends the HTML file into the TCP connection. This HTTP request/response eats up another RTT.
  • The total response time is two RTTs plus the transmission time at the server of the HTML file.
    HTTP with Persistent Connections
  • Non-persistent connections have shortcomings.
    First, a new connection must be established and maintained for each requested object. This can place a significant burden on the Web server.
    Second, each object suffers a delivery delay of two RTTs: one RTT to establish the TCP connection and one RTT to request and receive an object.
  • With persistent connections, the server leaves the TCP connection open after sending a response. Subsequent requests and responses between the same client and server can be sent over the same connection. An entire Web page (in the example above, the base HTML file and the 10 images) can be sent over a single persistent TCP connection. Multiple Web pages residing on the same server can be sent from the server to the same client over a single persistent TCP connection. These requests for objects can be made back-to-back, without waiting for replies to pending requests (pipelining).
  • HTTP server closes a connection when it isn’t used for a certain time (a configurable timeout interval). The default mode of HTTP uses persistent connections with pipelining.

2.2.3 HTTP Message Format

  • There are two types of HTTP messages, request messages and response messages.
    HTTP Request Message
  • A typical HTTP request message:
    GET /somedir/page.html HTTP/1.1
    Host: www.someschool.edu
    Connection: close
    User-agent: Mozilla/5.0
    Accept-language: fr
  • The first line of an HTTP request message is called the request line; the subsequent lines are called the header lines.
  • The request line has three fields: the method field, the URL field, and the HTTP version field.
    The method field can take on several different values, including GET, POST, HEAD, PUT, and DELETE. The GET method is used when the browser requests an object, with the requested object identified in the URL field.
  • The header line Host: www.someschool.edu specifies the host on which the object resides.
  • The Connection: close header line, the browser is telling the server that it doesn’t want persistent connections; it wants the server to close the connection after sending the requested object.
  • The User-agent: specifies the user agent, that is, the browser type that is making the request to the server. The server can send different versions of the same object to different types of user agents. (Each of the versions is addressed by the same URL.)
  • The Accept-language: header indicates that the user prefers to receive a French version of the object, if such an object exists on the server; otherwise, the server should send its default version.
    技术分享
  • After the header lines there is an “entity body.” The entity body is empty with the GET method, but is used with the POST method. An HTTP client often uses the POST method when the user fills out a form—for example, when a user provides search words to a search engine. With a POST message, the user is still requesting a Web page from the server, but the specific contents of the Web page depend on what the user entered into the form fields. If the value of the method field is POST, then the entity body contains what the user entered into the form fields.
  • When a server receives a request with the HEAD method, it responds with an HTTP message but it leaves out the requested object. Application developers often use the HEAD method for debugging.
  • The PUT method is often used in conjunction with Web publishing tools. It allows a user to upload an object to a specific path (directory) on a specific Web server. The PUT method is also used by applications that need to upload objects to Web servers.
  • The DELETE method allows a user, or an application, to delete an object on a Web server.
    HTTP Response Message
  • A typical HTTP response message:
    HTTP/1.1 200 OK
    Connection: close
    Date: Tue, 09 Aug 2011 15:44:04 GMT
    Server: Apache/2.2.3 (CentOS)
    Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
    Content-Length: 6821
    Content-Type: text/html
    (data data data data data …)
  • Response message has three sections: an initial status line, six header lines, and then the entity body.
  • The entity body is the meat of the message—it contains the requested object itself (represented by data data data data data …).
  • The status line has three fields: the protocol version field, a status code, and a corresponding status message. In this example, the status line indicates that the server is using HTTP/1.1 and that everything is OK (that is, the server has found, and is sending, the requested object).
  • The server uses the Connection: close header line to tell the client that it is going to close the TCP connection after sending the message.
  • The Date: header line indicates the time and date when the server retrieves the object from its file system, inserts the object into the response message, and sends the response message, not the time when the object was created or last modified.
  • The Server: header line indicates that the message was generated by an Apache Web server.
  • The Last-Modified: header line indicates the time and date when the object was created or last modified.
  • The Content-Length: header line indicates the number of bytes in the object being sent.
  • The Content-Type: header line indicates that the object in the entity body is HTML text. The object type is officially indicated by the Content-Type: header and not by the file extension.
    技术分享
  • The status code and associated phrase indicate the result of the request. Some common status codes and associated phrases include:
    ***200 OK: Request succeeded and the information is returned in the response.
    ***301 Moved Permanently: Requested object has been permanently moved; the new URL is specified in Location: header of the response message. The client software will automatically retrieve the new URL.
    ***400 Bad Request: The request could not be understood by the server.
    ***404 Not Found: The requested document does not exist on this server.
    ***505 HTTP Version Not Supported: The requested HTTP protocol version is not supported by the server.

2.2.4 User-Server Interaction: Cookies

  • Cookies allow sites to keep track of users.
  • Cookie has four components:
    (1) a cookie header line in the HTTP response message;
    (2) a cookie header line in the HTTP request message;
    (3) a cookie file kept on the user’s end system and managed by the user’s browser;
    (4) a back-end database at the Web site.
    技术分享
  • Suppose Susan contacts Amazon.com for the first time and in the past she has already visited the eBay site.
  • When the request comes into the Amazon Web server, the server creates a unique identification number and creates an entry in its back-end database that is indexed by the identification number. The Amazon Web server then responds to Susan’s browser, including in the HTTP response a Set-cookie: header, which contains the identification number. For example, the header line might be: Set-cookie: 1678
  • When Susan’s browser receives the HTTP response message, it sees the Set-cookie: header. The browser then appends a line to the special cookie file that it manages. This line includes the hostname of the server and the identification number in the Set-cookie: header. Note that the cookie file already has an entry for eBay, since Susan has visited that site in the past. As Susan continues to browse the Amazon site, each time she requests a Web page, her browser consults her cookie file, extracts her identification number for this site, and puts a cookie header line that includes the identification number in the HTTP request. So, each of her HTTP requests to the Amazon server includes the header line: Cookie: 1678
  • The Amazon server is able to track Susan’s activity at the Amazon site. Amazon Web site knows exactly which pages user 1678 visited, in which order, and at what times! Amazon uses cookies to provide its shopping cart service—Amazon can maintain a list of all of Susan’s intended purchases.
  • If Susan returns to Amazon’s site one week later, her browser will continue to put the header line Cookie: 1678 in the request messages. Amazon also recommends products to Susan based on Web pages she has visited at Amazon in the past. If Susan also registers herself with Amazon—providing full name—Amazon can then include this information in its database, thereby associating Susan’s name with her identification number and all of the pages she has visited at the site in the past.
  • This is how e-commerce sites provide “one-click shopping”—when Susan chooses to purchase an item during a subsequent visit, she doesn’t need to re-enter her name, credit card number, or address.

2.2.5 Web Caching

  • A Web cache (= proxy server) is a network entity that satisfies HTTP requests on the behalf of an origin Web server. The Web cache has its own disk storage and keeps copies of recently requested objects in this storage.
    技术分享
  • A user’s browser can be configured so that all of the user’s HTTP requests are first directed to the Web cache.
  • Suppose a browser is requesting the object http://www.someschool.edu/campus.gif. Here is what happens:
    1. The browser establishes a TCP connection to the Web cache and sends an HTTP request for the object to the Web cache.
    2. The Web cache checks to see if it has a copy of the object stored locally. If it does, the Web cache returns the object within an HTTP response message to the client browser.
    3. If the Web cache does not have the object, the Web cache opens a TCP connection to the origin server, that is, to www.someschool.edu. The Web cache then sends an HTTP request for the object into the cache-to-server TCP connection. After receiving this request, the origin server sends the object within an HTTP response to the Web cache.
    4. When the Web cache receives the object, it stores a copy in its local storage and sends a copy, within an HTTP response message, to the client browser over the existing TCP connection between the client browser and the Web cache.
  • Reasons for web caching:
    First, a Web cache can substantially reduce the response time for a client request, particularly if the bandwidth between the client and the origin server is much less than the bandwidth between the client and the cache. If there is a high-speed connection between the client and the cache and if the cache has the requested object, then the cache will be able to deliver the object rapidly to the client.
    Second, Web caches can substantially reduce traffic on an institution’s access link to the Internet. By reducing traffic, the institution (a company or a university) does not have to upgrade bandwidth as quickly, thereby reducing costs. Furthermore, Web caches can substantially reduce Web traffic in the Internet as a whole, thereby improving performance for all applications.

2.2.6 The Conditional GET

  • The conditional GET mechanism allows a cache to verify that its objects are up to date. An HTTP request message is a so-called conditional GET message if (1) the request message uses the GET method and (2) the request message includes an If-Modified-Since: header line.
  • First, on the behalf of a requesting browser, a proxy cache sends a request message to a Web server:
    GET /fruit/kiwi.gif HTTP/1.1
    Host: www.fruit.com
  • Second, the Web server sends a response message with the requested object to the cache:
    HTTP/1.1 200 OK
    Date: Sat, 8 Oct 2011 15:39:29
    Server: Apache/1.3.0 (Unix)
    Last-Modified: Wed, 7 Sep 2011 09:23:24
    Content-Type: image/gif
    (data data data data data …)
  • The cache forwards the object to the requesting browser but also caches the object locally. Importantly, the cache also stores the last-modified date along with the object.
  • Third, one week later, another browser requests the same object via the cache, and the object is still in the cache. The cache performs an up-to-date check by issuing a conditional GET. The cache sends:
    GET /fruit/kiwi.gif HTTP/1.1
    Host: www.fruit.com
    If-modified-since: Wed, 7 Sep 2011 09:23:24
  • This conditional GET is telling the server to send the object only if the object has been modified since the specified date. Suppose the object has not been modified since 7 Sep 2011 09:23:24.
  • Fourth, the Web server sends a response message to the cache:
    HTTP/1.1 304 Not Modified
    Date: Sat, 15 Oct 2011 15:39:29
    Server: Apache/1.3.0 (Unix)
    (empty entity body)
  • Response message has 304 Not Modified in the status line, which tells the cache that it can go ahead and forward its (the proxy cache’s) cached copy of the object to the requesting browser.

2.3 File Transfer: FTP

  • A typical FTP session: the user is sitting in front of one host (the local host) and wants to transfer files to or from a remote host. After the user provides a user identification and a password, the user can transfer files from the local file system to the remote file system and vice versa.
  • The user interacts with FTP through an FTP user agent. The user first provides the hostname of the remote host, causing the FTP client process in the local host to establish a TCP connection with the FTP server process in the remote host. The user then provides the user identification and password, which are sent over the TCP connection as part of FTP commands. Once the server has authorized the user, the user copies one or more files stored in the local file system into the remote file system (or vice versa).
    技术分享
    技术分享
  • FTP uses two parallel TCP connections to transfer a file, a control connection and a data connection. The control connection is used for sending control information between the two hosts— such as user identification, commands to change remote directory, and commands to “put” and “get” files. The data connection is used to actually send a file.
  • Because FTP uses a separate control connection, FTP is said to send its control information out-of-band. HTTP sends request and response header lines into the same TCP connection that carries the transferred file itself. For this reason, HTTP is said to send its control information in-band.
  • When a user starts an FTP session with a remote host, the client side of FTP (user) first initiates a control TCP connection with the server side (remote host) on server port number 21. The client side of FTP sends the user identification, password and commands to change the remote directory over this control connection. When the server side receives a command for a file transfer over the control connection, the server side initiates a TCP data connection to the client side. FTP sends exactly one file over the data connection and then closes the data connection. If during the same session, the user wants to transfer another file, FTP opens another data connection. Thus the control connection remains open throughout the duration of the user session, but a new data connection is created for each file transferred within a session.
  • Throughout a session, the FTP server must maintain state about the user: associate the control connection with a specific user account; keep track of the user’s current directory as the user wanders about the remote directory tree.

2.3.1 FTP Commands and Replies

  • The commands, from client to server, and replies, from server to client, are sent across the control connection in 7-bit ASCII format. In order to delineate successive commands, a carriage return and line feed end each command. Each command consists of four uppercase ASCII characters, some with optional arguments.
  • Some common commands:
    ***USER username: Used to send the user identification to the server.
    ***PASS password: Used to send the user password to the server.
    ***LIST: Used to ask the server to send back a list of all the files in the current remote directory. The list of files is sent over a new and non-persistent data connection rather than the control TCP connection.
    ***RETR filename: Used to retrieve (get) a file from the current directory of the remote host. This command causes the remote host to initiate a data connection and to send the requested file over the data connection.
    ***STOR filename: Used to store (put) a file into the current directory of the remote host.
  • There is typically a one-to-one correspondence between the command that the user issues and the FTP command sent across the control connection. Each command is followed by a reply, sent from server to client. The replies are three-digit numbers, with an optional message following the number.
  • Some typical replies, along with their possible messages:
    ? 331 Username OK, password required
    ? 125 Data connection already open; transfer starting
    ? 425 Can’t open data connection
    ? 452 Error writing file

2.4 Electronic Mail in the Internet

技术分享

  • Internet mail system has three major components: user agents, mail servers, and the Simple Mail Transfer Protocol (SMTP).
  • Microsoft Outlook is example of user agents for e-mail. When sender(Alice) is finished e-mail, her user agent sends the e-mail to her mail server, where the message is placed in the mail server’s outgoing message queue. When recipient(Bob) wants to read a message, his user agent retrieves the message from his mailbox in his mail server.
  • Each recipient has a mailbox located in one of the mail servers that manages and maintains the messages that have been sent to him.
  • If Alice’s server cannot deliver mail to Bob’s server, Alice’s server holds the message in a message queue and attempts to transfer the message later. Reattempts are often done every 30 minutes or so; if there is no success after several days, the server removes the message and notifies the sender (Alice) with an e-mail message.
  • SMTP is the principal application-layer protocol for Internet electronic mail. It uses the reliable data transfer service of TCP to transfer mail from the sender’s mail server to the recipient’s mail server. SMTP has two sides: a client side, which executes on the sender’s mail server, and a server side, which executes on the recipient’s mail server.

2.4.1 SMTP

  • Suppose Alice wants to send Bob a simple ASCII message.
    1. Alice invokes her user agent for e-mail, provides Bob’s e-mail address, composes a message, and instructs the user agent to send the message.
    2. Alice’s user agent sends the message to her mail server, where it is placed in a message queue.
    3. The client side of SMTP, running on Alice’s mail server, sees the message in the message queue. It opens a TCP connection to an SMTP server, running on Bob’s mail server.
    4. After some initial SMTP handshaking, the SMTP client sends Alice’s message into the TCP connection.
    5. At Bob’s mail server, the server side of SMTP receives the message. Bob’s mail server then places the message in Bob’s mailbox.
    6. Bob invokes his user agent to read the message at his convenience.
      技术分享
  • SMTP does not use intermediate mail servers for sending mail. If Bob’s mail server is down, the message remains in Alice’s mail server and waits for a new attempt—the message does not get placed in some intermediate mail server.
  • SMTP transfers a message from client to server:
    Client SMTP has TCP establish a connection to port 25 at the server SMTP. If the server is down, the client tries again later. Once this connection is established, the server and client perform application-layer handshaking. During this SMTP handshaking phase, the SMTP client indicates the e-mail address of the sender and the recipient. Once after handshaking, the client sends the message. SMTP can count on the reliable data transfer service of TCP to get the message to the server without errors. The client then repeats this process over the same TCP connection if it has other messages to send to the server; otherwise, it instructs TCP to close the connection.
  • SMTP client (C) SMTP server (S). The hostname of the client is crepes.fr and the hostname of the server is hamburger.edu. The ASCII text lines prefaced with C: are the lines the client sends into its TCP socket, and the ASCII text lines prefaced with S: are the lines the server sends into its TCP socket. The following begins as soon as the TCP connection is established.
    S: 220 hamburger.edu
    C: HELO crepes.fr
    S: 250 Hello crepes.fr, pleased to meet you
    C: MAIL FROM: alice@crepes.fr
    S: 250 alice@crepes.fr … Sender ok
    C: RCPT TO: bob@hamburger.edu
    S: 250 bob@hamburger.edu … Recipient ok
    C: DATA
    S: 354 Enter mail, end with “.” on a line by itself
    C: Do you like ketchup?
    C: How about pickles?
    C: .
    S: 250 Message accepted for delivery
    C: QUIT
    S: 221 hamburger.edu closing connection
  • The client sends a message (“Do you like ketchup? How about pickles?”) from mail server crepes.fr to mail server hamburger.edu. The client issued five commands: HELO (abbreviation for HELLO), MAIL FROM, RCPT TO, DATA, and QUIT. The client sends a line consisting of a single period(.), which indicates the end of the message to the server. The server replies to each command, with each reply having a reply code and some (optional) English-language explanation.
  • SMTP uses persistent connections: If the sending mail server has several messages to send to the same receiving mail server, it can send all of the messages over the same TCP connection. For each message, the client begins the process with a new MAIL FROM: crepes.fr, designates the end of message with an period, and issues QUIT only after all messages have been sent.

2.4.2 Comparison with HTTP

  • SMTP requires each message, including the body of each message, to be in 7-bit ASCII format. If the message contains characters that are not 7-bit ASCII or contains binary data (such as an image file), then the message has to be encoded into 7-bit ASCII. HTTP data does not impose this restriction.

2.4.3 Mail Message Formats

  • When an e-mail message is sent from one person to another, a header containing peripheral information precedes the body of the message itself. This peripheral information is contained in a series of header lines.
  • The header lines and the body of the message are separated by a blank line. Each header line contains readable text, consisting of a keyword followed by a colon followed by a value. Some of the keywords are required and others are optional. Every header must have a From: header line and a To: header line; a header may include a Subject: header line as well as other optional header lines. It is important to note that these header lines are different from the SMTP commands. The commands were part of the SMTP handshaking protocol; the header lines examined in this section are part of the mail message itself.
  • A typical message header looks like this:
    From: alice@crepes.fr
    To: bob@hamburger.edu
    Subject: Searching for the meaning of life.

2.4.4 Mail Access Protocols

The path an e-mail message takes when it is sent from Alice to Bob:
技术分享

  • Alice’s user agent uses SMTP to push the e-mail message into her mail server, then Alice’s mail server uses SMTP (as an SMTP client) to relay the e-mail message to Bob’s mail server. Alice’s mail server can repeatedly try to send the message to Bob’s mail server, say every 30 minutes, until Bob’s mail server becomes operational.
  • Bob’s user agent can’t use SMTP to obtain the messages because obtaining the messages is a pull operation, whereas SMTP is a push protocol. The problem is solved by introducing mail access protocol that transfers messages from Bob’s mail server to his local PC. There are a number of popular mail access protocols, including Post Office Protocol—Version 3 (POP3), Internet Mail Access Protocol (IMAP), and HTTP.
    POP3
  • POP3 begins when the user agent (the client) opens a TCP connection to the mail server (the server) on port 110. With the TCP connection established, POP3 progresses through three phases: authorization, transaction, and update.
  • Authorization phase: the user agent sends a username and a password to authenticate the user.
  • Transaction phase: the user agent retrieves messages and it can mark messages for deletion, remove deletion marks, and obtain mail statistics.
  • Update phase: occurs after the client has issued the quit command, ending the POP3 session. At this time, the mail server deletes the messages that were marked for deletion.
  • The authorization phase has two principal commands: user and pass .
  • In a POP3 transaction, the user agent issues commands, and the server responds to each command with a reply. There are two possible responses: +OK (sometimes followed by server-to-client data), indicate that the previous command was fine; and -ERR, indicate that something was wrong with the previous command.
  • A user agent using POP3 can be configured by the user to “download and delete” or to “download and keep.” The sequence of commands issued by a POP3 user agent depends on which of these two modes the user agent is operating in.
  • In the download-and-delete mode, the user agent will issue the list, retr, and dele commands. Suppose the user has two messages in his or her mailbox. C: (standing for client) is the user agent and S: (standing for server) is the mail server.
    C: list
    S: 1 498
    S: 2 912
    S: .
    C: retr 1
    S: (blah blah …
    S: ……………..
    S: ……….blah)
    S: .
    C: dele 1
    C: retr 2
    S: (blah blah …
    S: ……………..
    S: ……….blah)
    S: .
    C: dele 2
    C: quit
    S: +OK POP3 server signing off
  • The user agent first asks the mail server to list the size of each of the stored messages. The user agent then retrieves and deletes each message from the server. After the authorization phase, the user agent employed only four commands: list, retr, dele, and quit. After processing the quit command, the POP3 server enters the update phase and removes messages 1 and 2 from the mailbox.
  • The download-and-delete mode partitions Bob’s mail messages over different machines: if Bob first reads a message on his office PC, he will not be able to reread the message from his portable at home later in the evening. In the download-and-keep mode, the user agent leaves the messages on the mail server after downloading them. So Bob can reread messages from different machines.
  • During a POP3 session between a user agent and the mail server, the POP3 server keeps track of which user messages have been marked deleted. However, the POP3 server does not carry state information across POP3 sessions.
    IMAP
  • The POP3 protocol does not allow user to create remote folders and assign messages to folders.
  • An IMAP server associate each message with a folder: when a message first arrives at the server, it is associated with the recipient’s INBOX folder. The recipient can then move the message into a new, user-created folder, read the message, delete the message, and so on.
  • The IMAP protocol allow users to create folders and move messages from one folder to another; allow users to search remote folders for messages matching specific criteria.
  • IMAP server maintains user state information across IMAP sessions—for example, the names of the folders and which messages are associated with which folders.
  • IMAP has commands that permit a user agent to obtain components of messages. For example, a user agent can obtain just the message header of a message or just one part of a multipart MIME message.
    Web-Based E-Mail
  • With Web-based e-mail, the user agent is an ordinary Web browser, and the user communicates with its remote mailbox via HTTP. When a recipient(Bob) wants to access a message in his mailbox, the e-mail message is sent from Bob’s mail server to Bob’s browser using the HTTP protocol rather than the POP3 or IMAP protocol.
  • When a sender, such as Alice, wants to send an e-mail message, the e-mail message is sent from her browser to her mail server over HTTP rather than over SMTP. Alice’s mail server still sends messages to, and receives messages from, other mail servers using SMTP.

2.5 DNS—The Internet’s Directory Service

  • Host is identified by hostname(such as www.yahoo.com, gaia.cs.umass.edu) and IP addresses. An IP address consists of four bytes and has a rigid hierarchical structure. An IP address looks like 121.7.106.83, where each period separates one of the bytes expressed in decimal notation from 0 to 255.

2.5.1 Services Provided by DNS

  • The Internet’s domain name system (DNS) translates hostnames to IP addresses. The DNS is (1) a distributed database implemented in a hierarchy of DNS servers, and (2) an application-layer protocol that allows hosts to query the distributed database.
  • DNS is commonly employed by other application-layer protocols—including HTTP, SMTP, and FTP—to translate user-supplied hostnames to IP addresses.
  • E.g.: a browser (an HTTP client), running on some user’s host, requests the URL www.someschool.edu/index.html. In order for the user’s host to be able to send an HTTP request message to the Web server www.someschool.edu, the user’s host must first obtain the IP address of www.someschool.edu. This is done as follows.
    1. The same user machine runs the client side of the DNS application.
    2. The browser extracts the hostname, www.someschool.edu, from the URL and passes the hostname to the client side of the DNS application.
    3. The DNS client sends a query containing the hostname to a DNS server.
    4. The DNS client receives a reply, which includes the IP address for the hostname.
    5. Once the browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process located at port 80 at that IP address.
  • The desired IP address is often cached in a “nearby” DNS server, which helps to reduce DNS network traffic as well as the average DNS delay.
  • DNS provides a few other important services in addition to translating hostnames to IP addresses:
    ?Host aliasing. A host with a complicated hostname can have one or more alias names. The original hostname is said to be a canonical hostname. Alias hostnames are typically more mnemonic than canonical hostnames. DNS can be invoked by an application to obtain the canonical hostname and the IP address of the host for a supplied alias hostname.
    ?Mail server aliasing. For example, if Bob has an account with Hotmail, Bob’s e-mail address might be bob@hotmail.com, but the hostname of the Hotmail mail server is more complicated. DNS can be invoked by a mail application to obtain the canonical hostname and the IP address of the host for a supplied alias hostname. In fact, the MX record (see below) permits a company’s mail server and Web server to have identical (aliased) hostnames; for example, a company’s Web server and mail server can both be called enterprise.com.
    ?Load distribution. DNS is used to perform load distribution among replicated servers. Busy sites are replicated over multiple servers, with each server running on a different end system and each having a different IP address. For replicated Web servers, a set of IP addresses is associated with one canonical hostname. The DNS database contains this set of IP addresses. When clients make a DNS query for a name mapped to a set of addresses, the server responds with the entire set of IP addresses, but rotates the ordering of the addresses within each reply. Because a client typically sends its HTTP request message to the IP address that is listed first in the set, DNS rotation distributes the traffic among the replicated servers.

2.5.2 Overview of How DNS Works

  • DNS uses a large number of servers, organized in a hierarchical fashion and distributed around the world and the mappings are distributed across the DNS servers.
  • There are three classes of DNS servers: root DNS servers, top-level domain (TLD) DNS servers, and authoritative DNS servers.
    技术分享
  • Suppose a DNS client wants to determine the IP address for the hostname www.amazon.com. The client first contacts one of the root servers, which returns IP addresses for TLD servers for the top-level domain com. The client then contacts one of these TLD servers, which returns the IP address of an authoritative server for amazon.com. Finally, the client contacts one of the authoritative servers for amazon.com, which returns the IP address for the hostname www.amazon.com.
    技术分享
  • Root DNS servers. In the Internet there are 13 root DNS “servers” (labeled A through M), most of which are located in North America. Each “server” is actually a network of replicated servers, for both security and reliability purposes.
  • Top-level domain (TLD) servers. These servers are responsible for top-level domains such as com, org, net, edu, and gov, and all of the country top-level domains such as uk, fr, ca, and jp.
  • Authoritative DNS servers. Every organization with publicly accessible hosts (such as Web servers and mail servers) on the Internet must provide publicly accessible DNS records that map the names of those hosts to IP addresses. An organization’s authoritative DNS server houses these DNS records. An organization can choose to implement its own authoritative DNS server to hold these records or the organization can pay to have these records stored in an authoritative DNS server of some service provider.
  • Another type of DNS server called the local DNS server that does not strictly belong to the hierarchy of servers but is also important. Each ISP has a local DNS server (= a default name server). When a host connects to an ISP, the ISP provides the host with the IP addresses of one or more of its local DNS servers. A host’s local DNS server is typically “close to” the host.
  • When a host makes a DNS query, the query is sent to the local DNS server, which acts a proxy, forwarding the query into the DNS server hierarchy.
    技术分享
  • Suppose the host cis.poly.edu desires the IP address of gaia.cs.umass.edu and Polytechnic’s local DNS server is called dns.poly.edu and that an authoritative DNS server for gaia.cs.umass.edu is called dns.umass.edu.
  • The host cis.poly.edu first sends a DNS query message to its local DNS server, dns.poly.edu. The query message contains the hostname(gaia.cs.umass.edu) to be translated.
  • The local DNS server forwards the query message to a root DNS server. The root DNS server takes note of the edu suffix and returns to the local DNS server a list of IP addresses for TLD servers responsible for edu.
  • The local DNS server then resends the query message to one of these TLD servers. The TLD server takes note of the umass.edu suffix and responds with the IP address of the authoritative DNS server for the University of Massachusetts(dns.umass.edu).
  • The local DNS server resends the query message directly to dns.umass.edu, which responds with the IP address of gaia.cs.umass.edu.
    技术分享
    DNS Caching
  • DNS exploits DNS caching in order to improve the delay performance and to reduce the number of DNS messages ricocheting around the Internet.
  • In a query chain, when a DNS server receives a DNS reply (for example containing a mapping from a hostname to an IP address), it can cache the mapping in its local memory. For example, each time the local DNS server dns.poly.edu receives a reply from some DNS server, it can cache any of the information contained in the reply. If a hostname/IP address pair is cached in a DNS server and another query arrives to the DNS server for the same hostname, the DNS server can provide the desired IP address. Because hosts and mappings between hostnames and IP addresses are not permanent, DNS servers discard cached information after a period of time (often set to two days).
  • A local DNS server can also cache the IP addresses of TLD servers, allowing the local DNS server to bypass the root DNS servers in a query chain (this often happens).

2.5.3 DNS Records and Messages

  • The DNS servers that together implement the DNS distributed database store resource records (RRs), including RRs that provide hostname-to-IP address mappings. Each DNS reply message carries one or more resource records.
  • A resource record is a four-tuple that contains the following fields: (Name, Value, Type, TTL)
  • TTL is the time to live of the resource record; it determines when a resource should be removed from a cache. In the example records given below, we ignore the TTL field.
Name Value Type Example
hostname hostname’s IP address A (relay1.bar.foo.com, 145.37.93.126, A)
domain hostname of an authoritative DNS server that knows IP addresses for hosts in the domain NS (foo.com, dns.foo.com, NS)
hostname canonical hostname CNAME (foo.com, relay1.bar.foo.com, CNAME)
hostname canonical name of a mail server MX (foo.com, mail.bar.foo.com, MX)
  • By using the MX record, a company can have the same aliased name for its mail server and for one of its other servers (such as its Web server). To obtain the canonical name for the mail server, a DNS client would query for an MX record; to obtain the canonical name for the other server, the DNS client would query for the CNAME record.
  • If a DNS server is authoritative for a particular hostname, then the DNS server will contain a Type A record for the hostname. If a server is not authoritative for a hostname, then the server will contain a Type NS record for the domain that includes the hostname; it will also contain a Type A record that provides the IP address of the DNS server in the Value field of the NS record.
    Suppose an edu TLD server is not authoritative for the host gaia.cs.umass.edu. Then this server will contain a record for a domain that includes the host gaia.cs.umass.edu, (umass.edu, dns.umass.edu, NS). The edu TLD server would also contain a Type A record, which maps the DNS server dns.umass.edu to an IP address, for example, (dns.umass.edu, 128.119.40.111, A).
    DNS Messages
  • These are the only two kinds of DNS messages: query and reply messages, and both have the same format:
    技术分享
    The semantics of the various fields in a DNS message are as follows:
  • The first 12 bytes is the header section.
    1. The first field is a 16-bit number that identifies the query. This identifier is copied into the reply message to a query, allowing the client to match received replies with sent queries.
    2. There are a number of flags in the flag field.
      A 1-bit query/reply flag indicates whether the message is a query (0) or a reply (1).
      A 1-bit authoritative flag is set in a reply message when a DNS server is an authoritative server for a queried name.
      A 1-bit recursion-desired flag is set when a client (host or DNS server) desires that the DNS server perform recursion when it doesn’t have the record.
      A 1-bit recursion-available field is set in a reply if the DNS server supports recursion.
    3. Four number-of fields. These fields indicate the number of occurrences of the four types of data sections that follow the header.
  • The question section contains information about the query that is being made. This section includes (1) a name field that contains the name that is being queried, and (2) a type field that indicates the type of question being asked about the name—for example, a host address associated with a name (Type A) or the mail server for a name (Type MX).
  • The answer section contains the resource records for the name that was originally queried. A reply can return multiple RRs in the answer, since a hostname can have multiple IP addresses (for example: replicated Web servers).
  • The authority section contains records of other authoritative servers.
  • The additional section contains other helpful records. For example, the answer field in a reply to an MX query contains a resource record providing the canonical hostname of a mail server. The additional section contains a Type A record providing the IP address for the canonical hostname of the mail server.
    Inserting Records into the DNS Database
  • Suppose you created a new company called Network Utopia and you’ll register the domain name networkutopia.com at a registrar which is a commercial entity that verifies the uniqueness of the domain name, enters the domain name into the DNS database, and collects a small fee from you for its services.
  • When you register, you need to provide the registrar with the names and IP addresses of your primary and secondary authoritative DNS servers. Suppose the names and IP addresses are dns1.networkutopia.com, dns2.networkutopia.com, 212.212.212.1, and 212.212.212.2. For each of these two authoritative DNS servers, the registrar would then make sure that a Type NS and a Type A record are entered into the TLD com servers. E.g.: for the primary authoritative server for networkutopia.com, the registrar would insert the following two resource records into the DNS system:
    (networkutopia.com, dns1.networkutopia.com, NS)
    (dns1.networkutopia.com, 212.212.212.1, A)
  • You’ll also have to make sure that the Type A resource record for your Web server www.networkutopia.com and the Type MX resource record for your mail server mail.networkutopia.com are entered into your authoritative DNS servers. (An UPDATE option has been added to the DNS protocol to allow data to be dynamically added or deleted from the database via DNS messages.)
  • After these steps are completed, people will be able to visit your Web site and send e-mail to the employees at your company. Suppose Alice wants to view the Web page www.networkutopia.com. Her host will first send a DNS query to her local DNS server. The local DNS server will then contact a TLD com server. (The local DNS server will also have to contact a root DNS server if the address of a TLD com server is not cached.) Because this TLD server contains the Type NS and Type A resource records listed above, so this TLD com server sends a reply to Alice’s local DNS server, with the reply containing the two resource records. The local DNS server then sends a DNS query to 212.212.212.1, asking for the Type A record corresponding to www.networkutopia.com. This record provides the IP address of the desired Web server and then the local DNS server passes this record back to Alice’s host. Alice’s browser can now initiate a TCP connection to the host 212.212.71.4 and send an HTTP request over the connection.

2.6 Peer-to-Peer Applications

2.6.1 P2P File Distribution

  • In client-server file distribution, the server must send a copy of the file to each of the peers—placing burden on the server and consuming server bandwidth.
    In P2P file distribution, each peer can redistribute any portion of the file it has received to any other peers, thereby assisting the server in the distribution process.
    技术分享
  • Suppose:
    us: upload rate of the server;ui: upload rate of the ith peer;di: download rate of the ith peer;F: size of the file to be distributed (in bits);N: number of peers that want to obtain a copy of the file.
  • The distribution time is the time it takes to get a copy of the file to all N peers. Dcs: distribution time for the client-server architecture; distribution time for P2P, denoted by DP2P.
    Client-server architecture:
  • The server must transmit one copy of the file to each of the N peers, so the server must transmit N*F bits. The time to distribute the file must be at least N*F/us.
  • Let dmin = min{d1,d2 ,…,dN }. The peer with the lowest download rate(dmin) cannot obtain all F bits of the file in less than F/dmin seconds. The minimum distribution time is at least F/dmin.
    Dcs = max{ N*F/us, F/dmin}
  • For N large enough, the client-server distribution time is given by N*F/us. So the distribution time increases linearly with the number of peers N.
    P2P architecture:
  • When a peer receives some file data, it can use its own upload capacity to redistribute the data to other peers.
  • At the beginning of the distribution, the server must send all file at least once into the community of peers. The minimum distribution time is at least F/us.
  • The peer with the lowest download rate cannot obtain all F bits of the file in less than F/dmin seconds. The minimum distribution time is at least F/dmin.
  • The total upload capacity of the system is equal to the upload rate of the server plus the upload rates of each of the individual peers, utotal = us + u1 + … + uN. The system must upload F bits to each of the N peers, thus uploading N*F bits. The minimum distribution time is also at least N*F/(us + u1 + … + uN).
    DP2P = max{ F/us, F/dmin, N*F/(us + u1 + … + uN) }
  • We imagine that each peer can redistribute a bit as soon as it receives the bit. In reality, where chunks of the file are redistributed rather than individual bits, Equation 2.3 serves as a good approximation of the actual minimum distribution time.
    技术分享
  • For the client-server architecture, the distribution time increases linearly and without bound as the number of peers increases. For the P2P architecture, the minimal distribution time is always less than the distribution time of the client-server architecture.
    BitTorrent
  • BitTorrent is a P2P protocol for file distribution. The collection of all peers participating in the distribution of a particular file is called a torrent. Peers in a torrent download equal-size chunks of the file from one another, with a typical chunk size of 256 KBytes. When a peer first joins a torrent, it has no chunks. Over time it accumulates more and more chunks. While it downloads chunks it also uploads chunks to other peers. Once a peer has acquired the entire file, it may leave the torrent, or remain in the torrent and continue to upload chunks to other peers. Any peer may leave the torrent at any time with only a subset of chunks, and later rejoin the torrent.
  • Each torrent has a node called a tracker. When a peer joins a torrent, it registers itself with the tracker and periodically informs the tracker that it is still in the torrent. In this manner, the tracker keeps track of the peers that are participating in the torrent. A given torrent may have fewer than ten or more than a thousand peers participating at any instant of time.
    技术分享
  • When a new peer, Alice, joins the torrent, the tracker randomly selects a subset of peers (say 50) from the set of participating peers, and sends the IP addresses of these 50 peers to Alice. Possessing this list of peers, Alice attempts to establish concurrent TCP connections with all the peers on this list. Call all the peers with which Alice succeeds in establishing a TCP connection “neighboring peers.” (Alice is shown to have 3 neighboring peers.) As time evolves, some of these peers may leave and other peers (outside the initial 50) may attempt to establish TCP connections with Alice. So a peer’s neighboring peers will change over time.
  • At any given time, each peer will have a subset of chunks from the file, with different peers having different subsets. Alice will ask each of her neighboring peers (over the TCP connections) for the list of the chunks they have. If Alice has L different neighbors, she will obtain L lists of chunks. So Alice will issue requests (over the TCP connections) for chunks she currently does not have. So at any given instant of time, Alice will have a subset of chunks and will know which chunks her neighbors have.
  • With this information, Alice will have two decisions to make. First, which chunks should she request first from her neighbors? Second, to which of her neighbors should she send requested chunks?
  • First: Alice uses a technique called rarest first. The idea is to determine, from among the chunks she does not have, the chunks that are the rarest among her neighbors (that is, the chunks that have the fewest repeated copies among her neighbors) and then request those rarest chunks first. So the rarest chunks get more quickly redistributed, aiming to equalize the numbers of copies of each chunk in the torrent.
  • Second: For each of her neighbors, Alice continually measures the rate at which she receives bits and determines the four peers that are feeding her bits at the highest rate. She then reciprocates by sending chunks to these same four peers.Every 10 seconds, she recalculates the rates and possibly modifies the set of four peers. These four peers are said to be unchoked.Every 30 seconds, she picks one additional neighbor at random and sends it chunks (say Bob). Bob is said to be optimistically unchoked. That is, every 30 seconds, Alice will randomly choose a new trading partner and initiate trading with that partner. If the two peers are satisfied with the trading, they will put each other in their top four lists and continue trading with each other until one of the peers finds a better partner. The random neighbor selection also allows new peers to get chunks. All other neighboring peers besides these five peers are “choked,” that is, they do not receive any chunks from Alice.

2.6.2 Distributed Hash Tables (DHTs)

  • In the P2P system, each peer will only hold a small subset of the totality of the (key, value) pairs. We’ll allow any peer to query the distributed database with a particular key. The distributed database will then locate the peers that have the corresponding (key, value) pairs and return the key-value pairs to the querying peer. Any peer will also be allowed to insert new key-value pairs into the database. Such a distributed database is referred to as a distributed hash table (DHT).
  • First assign an identifier to each peer, where each identifier is an integer in the range [0, 2n - 1] for some fixed n. Let’s also require each key to be an integer in the same range. To create integers of key that are not integers, we will use a hash function that maps each key (e.g., social security number) to an integer in the range [0, 2n - 1].
  • Next assign keys to peers. Given that each peer has an integer identifier and that each key is also an integer in the same range, so assign each (key, value) pair to the peer whose identifier is the closest to the key. Definition of closest: if the key is exactly equal to one of the peer identifiers, we store the (key, value) pair in that matching peer; otherwise define the closest peer as the closest successor of the key. If the key is larger than all the peer identifiers, we use a modulo-2n convention, storing the (key, value) pair in the peer with the smallest identifier.
  • Now suppose a peer, Alice, wants to insert a (key, value) pair into the DHT. She first determines the peer whose identifier is closest to the key; she then sends a message to that peer, instructing it to store the (key, value) pair. But how does Alice determine the peer that is closest to the key? If Alice were to keep track of all the peers in the system (peer IDs and corresponding IP addresses), she could locally determine the closest peer. But such an approach requires each peer to keep track of all other peers in the DHT, which is impossible.
    Circular DHT
  • In this circular arrangement, each peer only keeps track of its immediate successor and immediate predecessor (modulo 2n). E.g.: n = 4 and there are the same eight peers. So, peer 5 knows the IP address and identifier for peers 8 and 4 but does not necessarily know anything about any other peers that in the DHT.
    技术分享
  • Using the circular overlay, the origin peer (peer 3) creates a message saying “Who is responsible for key 11?” and sends this message clockwise around the circle. Whenever a peer receives such a message, because it knows the identifier of its successor and predecessor, it can determine whether it is responsible for (that is, closest to) the key in question. If a peer is not responsible for the key, it simply sends the message to its successor. So, when peer 4 receives the message asking about key 11, it determines that it is not responsible for the key (because its successor is closer to the key), so it passes the message to peer 5. This process continues until the message arrives at peer 12, who determines that it is the closest peer to key 11. At this point, peer 12 can send a message back to the querying peer, peer 3, indicating that it is responsible for key 11.
  • Although each peer is only aware of two neighboring peers, to find the node responsible for a key (in the worst case), all N nodes in the DHT will have to forward a message around the circle; N/2 messages are sent on average.
  • In designing a DHT, there is trade-off between the number of neighbors each peer has to track and the number of messages that the DHT needs to send to resolve a single query.
    On one hand, if each peer tracks all other peers, then only one message is sent per query, but each peer has to keep track of N peers.
    On the other hand, with a circular DHT, each peer is only aware of two peers, but N/2 messages are sent on average for each query.
  • One solution is to use the circular overlay as a foundation, but add “shortcuts” so that each peer not only keeps track of its immediate successor and predecessor, but also of a relatively small number of shortcut peers scattered about the circle.
  • When a peer receives a message that is querying for a key, it forwards the message to the neighbor (successor neighbor or one of the shortcut neighbors) which is the closet to the key. So, when peer 4 receives the message asking about key 11, it determines that the closet peer to the key (among its neighbors) is its shortcut neighbor 10 and then forwards the message directly to peer 10.
    Peer Churn
  • In P2P systems, a peer can come or go without warning. Thus, we must maintain the DHT overlay in the presence of such peer churn. To handle peer churn, we will now require each peer to track (that is, know the IP address of) its first and second successors; for example, peer 4 now tracks both peer 5 and peer 8. We also require each peer to periodically verify that its two successors are alive (by periodically sending ping messages to them and asking for responses).
  • Suppose peer 5 abruptly leaves. In this case, the two peers preceding the departed peer (4 and 3) learn that 5 has departed, since it no longer responds to ping messages. Peers 4 and 3 thus need to update their successor state information.
  • Then peer 4 updates its state:
    1. Peer 4 replaces its first successor (peer 5) with its second successor (peer 8).
    2. Peer 4 then asks its new first successor (peer 8) for the identifier and IP address of its immediate successor (peer 10). Peer 4 then makes peer 10 its second successor.
  • Suppose a peer with identifier 13 wants to join the DHT, and at the time of joining, it only knows about peer 1’s existence in the DHT. Peer 13 would first send peer 1 a message, saying “what will be 13’s predecessor and successor?” This message gets forwarded through the DHT until it reaches peer 12, who realizes that it will be 13’s predecessor and that its current successor, peer 15, will become 13’s successor. Next, peer 12 sends this predecessor and successor information to peer 13. Peer 13 can now join the DHT by making peer 15 its successor and by notifying peer 12 that it should change its immediate successor to 13.

2.7 Socket Programming: Creating Network Applications in Python

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1

2-Application Layer

标签:

原文地址:http://blog.csdn.net/gaoxiangnumber1/article/details/51954685

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!