
- •Contents
- •Foreword to the First Edition
- •Preface to the Third Edition
- •Preface to the Second Edition
- •Preface to the First Edition
- •1 SIP and the Internet
- •1.1 Signaling Protocols
- •1.2 Internet Multimedia Protocol Stack
- •1.2.1 Physical Layer
- •1.2.2 Data/Link Layer
- •1.2.3 Network Layer
- •1.2.4 Transport Layer
- •1.2.5 Application Layer
- •1.2.6 Utility Applications
- •1.2.7 Multicast
- •1.3 Internet Names
- •1.4 URLs, URIs, and URNs
- •1.5 Domain Name Service
- •1.5.1 DNS Resource Records
- •1.5.2 Address Resource Records (A or AAAA)
- •1.5.3 Service Resource Records (SRV)
- •1.5.4 Naming Authority Pointer Resource Records (NAPTR)
- •1.5.5 DNS Resolvers
- •1.6 Global Open Standards
- •1.7 Internet Standards Process
- •1.8 A Brief History of SIP
- •1.9 Conclusion
- •References
- •2 Introduction to SIP
- •2.1 A Simple Session Establishment Example
- •2.2 SIP Call with a Proxy Server
- •2.3 SIP Registration Example
- •2.4 SIP Presence and Instant Message Example
- •2.5 Message Transport
- •2.5.1 UDP Transport
- •2.5.2 TCP Transport
- •2.5.3 TLS Transport
- •2.5.4 SCTP Transport
- •2.6 Transport Protocol Selection
- •2.7 Conclusion
- •2.8 Questions
- •References
- •3 SIP Clients and Servers
- •3.1 SIP User Agents
- •3.2 Presence Agents
- •3.3 Back-to-Back User Agents
- •3.4 SIP Gateways
- •3.5 SIP Servers
- •3.5.1 Proxy Servers
- •3.5.2 Redirect Servers
- •3.5.3 Registrar Servers
- •3.6 Uniform Resource Indicators
- •3.7 Acknowledgment of Messages
- •3.8 Reliability
- •3.9 Multicast Support
- •3.10 Conclusion
- •3.11 Questions
- •References
- •4 SIP Request Messages
- •4.1 Methods
- •4.1.1 INVITE
- •4.1.2 REGISTER
- •4.1.5 CANCEL
- •4.1.6 OPTIONS
- •4.1.7 SUBSCRIBE
- •4.1.8 NOTIFY
- •4.1.9 PUBLISH
- •4.1.10 REFER
- •4.1.11 MESSAGE
- •4.1.12 INFO
- •4.1.13 PRACK
- •4.1.14 UPDATE
- •4.2 URI and URL Schemes Used by SIP
- •4.2.1 SIP and SIPS URIs
- •4.2.2 Telephone URLs
- •4.2.3 Presence and Instant Messaging URLs
- •4.3 Tags
- •4.4 Message Bodies
- •4.5 Conclusion
- •4.6 Questions
- •References
- •5 SIP Response Messages
- •5.1 Informational
- •5.1.1 100 Trying
- •5.1.2 180 Ringing
- •5.1.3 181 Call is Being Forwarded
- •5.1.4 182 Call Queued
- •5.1.5 183 Session Progress
- •5.2 Success
- •5.2.2 202 Accepted
- •5.3 Redirection
- •5.3.1 300 Multiple Choices
- •5.3.2 301 Moved Permanently
- •5.3.3 302 Moved Temporarily
- •5.3.4 305 Use Proxy
- •5.3.5 380 Alternative Service
- •5.4 Client Error
- •5.4.1 400 Bad Request
- •5.4.2 401 Unauthorized
- •5.4.3 402 Payment Required
- •5.4.4 403 Forbidden
- •5.4.5 404 Not Found
- •5.4.6 405 Method Not Allowed
- •5.4.7 406 Not Acceptable
- •5.4.8 407 Proxy Authentication Required
- •5.4.9 408 Request Timeout
- •5.4.11 410 Gone
- •5.4.12 411 Length Required
- •5.4.13 412 Conditional Request Failed
- •5.4.14 413 Request Entity Too Large
- •5.4.15 414 Request-URI Too Long
- •5.4.16 415 Unsupported Media Type
- •5.4.17 416 Unsupported URI Scheme
- •5.4.18 417 Unknown Resource Priority
- •5.4.19 420 Bad Extension
- •5.4.20 421 Extension Required
- •5.4.21 422 Session Timer Interval Too Small
- •5.4.22 423 Interval Too Brief
- •5.4.23 428 Use Identity Header
- •5.4.24 429 Provide Referror Identity
- •5.4.25 430 Flow Failed
- •5.4.26 433 Anonymity Disallowed
- •5.4.27 436 Bad Identity-Info Header
- •5.4.29 438 Invalid Identity Header
- •5.4.30 439 First Hop Lacks Outbound Support
- •5.4.31 440 Max-Breadth Exceeded
- •5.4.32 470 Consent Needed
- •5.4.33 480 Temporarily Unavailable
- •5.4.34 481 Dialog/Transaction Does Not Exist
- •5.4.35 482 Loop Detected
- •5.4.36 483 Too Many Hops
- •5.4.37 484 Address Incomplete
- •5.4.38 485 Ambiguous
- •5.4.39 486 Busy Here
- •5.4.40 487 Request Terminated
- •5.4.41 488 Not Acceptable Here
- •5.4.42 489 Bad Event
- •5.4.43 491 Request Pending
- •5.4.44 493 Request Undecipherable
- •5.4.45 494 Security Agreement Required
- •5.5 Server Error
- •5.5.1 500 Server Internal Error
- •5.5.2 501 Not Implemented
- •5.5.3 502 Bad Gateway
- •5.5.4 503 Service Unavailable
- •5.5.5 504 Gateway Timeout
- •5.5.6 505 Version Not Supported
- •5.5.7 513 Message Too Large
- •5.5.8 580 Preconditions Failure
- •5.6 Global Error
- •5.6.1 600 Busy Everywhere
- •5.6.2 603 Decline
- •5.6.3 604 Does Not Exist Anywhere
- •5.6.4 606 Not Acceptable
- •5.7 Questions
- •References
- •6 SIP Header Fields
- •6.1 Request and Response Header Fields
- •6.1.1 Accept
- •6.1.2 Accept-Encoding
- •6.1.3 Accept-Language
- •6.1.4 Alert-Info
- •6.1.5 Allow
- •6.1.6 Allow-Events
- •6.1.7 Answer-Mode
- •6.1.8 Call-ID
- •6.1.9 Contact
- •6.1.10 CSeq
- •6.1.11 Date
- •6.1.12 Encryption
- •6.1.13 Expires
- •6.1.14 From
- •6.1.15 History Info
- •6.1.16 Organization
- •6.1.17 Path
- •6.1.19 Record-Route
- •6.1.20 Recv-Info
- •6.1.21 Refer-Sub
- •6.1.22 Retry-After
- •6.1.23 Subject
- •6.1.24 Supported
- •6.1.25 Timestamp
- •6.1.27 User-Agent
- •6.2 Request Header Fields
- •6.2.1 Accept-Contact
- •6.2.2 Authorization
- •6.2.3 Call-Info
- •6.2.4 Event
- •6.2.5 Hide
- •6.2.6 Identity
- •6.2.7 Identity-Info
- •6.2.8 In-Reply-To
- •6.2.9 Info-Package
- •6.2.10 Join
- •6.2.11 Priority
- •6.2.12 Privacy
- •6.2.13 Proxy-Authorization
- •6.2.14 Proxy-Require
- •6.2.15 P-OSP-Auth-Token
- •6.2.16 P-Asserted-Identity
- •6.2.17 P-Preferred-Identity
- •6.2.18 Max-Breadth
- •6.2.19 Max-Forwards
- •6.2.20 Reason
- •6.2.21 Refer-To
- •6.2.22 Referred-By
- •6.2.23 Reply-To
- •6.2.24 Replaces
- •6.2.25 Reject-Contact
- •6.2.26 Request-Disposition
- •6.2.27 Require
- •6.2.28 Resource-Priority
- •6.2.29 Response-Key
- •6.2.30 Route
- •6.2.31 RAck
- •6.2.32 Security-Client
- •6.2.33 Security-Verify
- •6.2.34 Session-Expires
- •6.2.35 SIP-If-Match
- •6.2.36 Subscription-State
- •6.2.37 Suppress-If-Match
- •6.2.38 Target-Dialog
- •6.2.39 Trigger-Consent
- •6.3 Response Header Fields
- •6.3.1 Accept-Resource-Priority
- •6.3.2 Authentication-Info
- •6.3.3 Error-Info
- •6.3.4 Flow-Timer
- •6.3.5 Min-Expires
- •6.3.7 Permission-Missing
- •6.3.8 Proxy-Authenticate
- •6.3.9 Security-Server
- •6.3.10 Server
- •6.3.11 Service-Route
- •6.3.12 SIP-ETag
- •6.3.13 Unsupported
- •6.3.14 Warning
- •6.3.15 WWW-Authenticate
- •6.3.16 RSeq
- •6.4 Message Body Header Fields
- •6.4.1 Content-Encoding
- •6.4.2 Content-Disposition
- •6.4.3 Content-Language
- •6.4.4 Content-Length
- •6.4.5 Content-Type
- •6.4.6 MIME-Version
- •6.5 Questions
- •References
- •7 Wireless, Mobility, and IMS
- •7.1 IP Mobility
- •7.2 SIP Mobility
- •7.4 IMS Header Fields
- •7.5 Conclusion
- •7.6 Questions
- •References
- •8 Presence and Instant Messaging
- •8.1 Introduction
- •8.2 History of IM and Presence
- •8.3 SIMPLE
- •8.4 Presence with SIMPLE
- •8.4.1 SIP Events Framework
- •8.4.2 Presence Bodies
- •8.4.3 Resource Lists
- •8.4.4 Filtering
- •8.4.6 Partial Publication
- •8.4.7 Presence Documents Summary
- •8.5 Instant Messaging with SIMPLE
- •8.5.1 Page Mode Instant Messaging
- •8.5.4 Message Composition Indication
- •8.5.5 Multiple Recipient Messages
- •8.5.6 Session Mode Instant Messaging
- •8.6 Jabber
- •8.6.1 Standardization as Extensible Messaging and Presence Protocol
- •8.6.2 Interworking with SIMPLE
- •8.6.3 Jingle
- •8.6.4 Future Standardization of XMPP
- •8.7 Conclusion
- •8.8 Questions
- •References
- •9 Services in SIP
- •9.1 Gateway Services
- •9.2 SIP Trunking
- •9.3 SIP Service Examples
- •9.4 Voicemail
- •9.5 SIP Video
- •9.6 Facsimile
- •9.7 Conferencing
- •9.7.1 Focus
- •9.7.2 Mixer
- •9.8 Application Sequencing
- •9.9 Other SIP Service Architectures
- •9.9.1 Service Oriented Architecture
- •9.9.2 Servlets
- •9.9.3 Service Delivery Platform
- •9.10 Conclusion
- •9.11 Questions
- •References
- •10 Network Address Translation
- •10.1 Introduction to NAT
- •10.2 Advantages of NAT
- •10.3 Disadvantages of NAT
- •10.4 How NAT Works
- •10.5 Types of NAT
- •10.5.1 Endpoint Independent Mapping NAT
- •10.5.2 Address Dependent Mapping NAT
- •10.5.3 Address and Port Dependent Mapping NAT
- •10.5.4 Hairpinning Support
- •10.5.5 IP Address Pooling Options
- •10.5.6 Port Assignment Options
- •10.5.7 Mapping Refresh
- •10.5.8 Filtering Modes
- •10.6 NAT Mapping Examples
- •10.7 NATs and SIP
- •10.8 Properties of a Friendly NAT or How a NAT Should BEHAVE
- •10.9 STUN Protocol
- •10.10 UNSAF Requirements
- •10.11 SIP Problems with NAT
- •10.11.1 Symmetric SIP
- •10.11.2 Connection Reuse
- •10.11.3 SIP Outbound
- •10.12 Media NAT Traversal Solutions
- •10.12.1 Symmetric RTP
- •10.12.2 RTCP Attribute
- •10.12.3 Self-Fixing Approach
- •10.13 Hole Punching
- •10.14 TURN: Traversal Using Relays Around NAT
- •10.15 ICE: Interactive Connectivity Establishment
- •10.16 Conclusion
- •10.17 Questions
- •References
- •11 Related Protocols
- •11.1 PSTN Protocols
- •11.1.1 Circuit Associated Signaling
- •11.1.2 ISDN Signaling
- •11.1.3 ISUP Signaling
- •11.2 SIP for Telephones
- •11.3 Media Gateway Control Protocols
- •11.4.1 Introduction to H.323
- •11.4.2 Example of H.323
- •11.4.3 Versions
- •References
- •12 Media Transport
- •12.1 Real-Time Transport Protocol (RTP)
- •12.2 RTP Control Protocol (RTCP)
- •12.2.1 RTCP Reports
- •12.2.2 RTCP Extended Reports
- •12.3 Compression
- •12.4.1 Audio Codecs
- •12.4.2 Video Codecs
- •12.5 Conferencing
- •12.6 ToIP—Conversational Text
- •12.7 DTMF Transport
- •12.8 Questions
- •References
- •13 Negotiating Media Sessions
- •13.1 Session Description Protocol (SDP)
- •13.1.1 Protocol Version
- •13.1.2 Origin
- •13.1.3 Session Name and Information
- •13.1.5 E-Mail Address and Phone Number
- •13.1.6 Connection Data
- •13.1.7 Bandwidth
- •13.1.8 Time, Repeat Times, and Time Zones
- •13.1.9 Encryption Keys
- •13.1.10 Media Announcements
- •13.1.11 Attributes
- •13.2 SDP Extensions
- •13.3 The Offer Answer Model
- •13.3.1 Rules for Generating an Offer
- •13.3.2 Rules for Generating an Answer
- •13.3.3 Rules for Modifying a Session
- •13.3.4 Special Case—Call Hold
- •13.4 Static and Dynamic Payloads
- •13.5 SIP Offer Answer Exchanges
- •13.6 Conclusion
- •13.7 Questions
- •References
- •14 SIP Security
- •14.1 Basic Security Concepts
- •14.1.1 Encryption
- •14.1.2 Public Key Cryptography
- •14.1.4 Message Authentication
- •14.2 Threats
- •14.3 Security Protocols
- •14.3.1 IPSec
- •14.3.3 DNSSec
- •14.3.4 Secure MIME
- •14.4 SIP Security Model
- •14.4.1 SIP Digest Authentication
- •14.4.2 SIP Authentication Using TLS
- •14.4.3 Secure SIP
- •14.4.4 Identity
- •14.4.5 Enhanced SIP Identity
- •14.6 Media Security
- •14.6.1 Non-RTP Media
- •14.6.2 Secure RTP
- •14.6.3 Keying SRTP
- •14.6.4 Best Effort Encryption
- •14.6.5 ZRTP
- •14.7 Questions
- •References
- •15 Peer-to-Peer SIP
- •15.1 P2P Properties
- •15.2 P2P Properties of SIP
- •15.3 P2P Overlays
- •15.4 RELOAD
- •15.5 Host Identity Protocol
- •15.6 Conclusion
- •15.7 Questions
- •References
- •16 Call Flow Examples
- •16.1 SIP Call with Authentication, Proxies, and Record-Route
- •16.2 SIP Call with Stateless and Stateful Proxies with Called Party Busy
- •16.3 SIP to PSTN Call Through Gateways
- •16.4 PSTN to SIP Call Through a Gateway
- •16.5 Parallel Search
- •16.6 Call Setup with Two Proxies
- •16.7 SIP Presence and Instant Message Example
- •References
- •17 Future Directions
- •17.2 More Extensions
- •17.3 Better Identity
- •17.4 Interdomain SIP
- •17.5 Making Features Work Better
- •17.6 Emergency Calling
- •17.7 More SIP Trunking
- •17.9 Improved NAT Traversal
- •17.10 Security Deployment
- •17.11 Better Interoperability
- •References
- •A.1 ABNF Rules
- •A.2 Introduction to XML
- •References
- •About the Author
2 |
SIP: Understanding the Session Initiation Protocol |
•Modification of existing media sessions;
•Teardown of existing media sessions.
SIP has also been extended to request and deliver presence information (online/off-line status and location information such as that contained in a buddy list) as well as instant message sessions. These functions include:
•Publishing and uploading of presence information;
•Requesting delivery of presence information;
•Presence and other event notification;
•Transporting of instant messages.
While some of the examples discuss SIP from a telephony perspective, there will be many nontelephony uses for SIP. SIP will likely be used to establish a set of session types that bear almost no resemblance to a telephone call.
The following section will introduce the Internet multimedia protocol stack and discuss these protocols at a high level.
1.2 Internet Multimedia Protocol Stack
Figure 1.1 shows the five layer Internet multimedia protocol stack. The layers shown and protocols identified will be discussed.
1.2.1Physical Layer
The physical layer is the lowest layer of the protocol stack. It shows how devices are physically connected with each other. Common physical layer methods include copper (coax, twisted pair, or other wired connections), photons (fiber optics, laser light, or other photonic sources), or phonons (radio waves, microwaves, or other electromagnetic transmissions).
1.2.2Data/Link Layer
The next layer is the data/link layer, which could be an Ethernet local area network (LAN); a telephone line (V.90 or 56k modem) running Point-to-Point Protocol (PPP); a digital subscriber line (DSL); or even a wireless 802.11 network. This layer performs such functions as symbol exchange, frame synchronization, and physical interface specification. Ethernet typically adds a 13-octet header and a 3-octet footer to every packet sent. Note that an octet is 8 bits of data, sometimes called a byte.

SIP and the Internet |
3 |
1.2.3Network Layer
The next layer in Figure 1.1 is the network or Internet layer. Internet Protocol (IP) [2] is used at this layer to route a packet across the network using the destination IP address. IP is a connectionless, best-effort packet delivery protocol. IP packets can be lost, delayed, or received out of sequence. Each packet is routed on its own using the IP header appended to the physical packet. Most IP address examples in this book use the older version of IP, version 4 (IPv4). IPv4 addresses are four octets long, usually written in “dotted decimal” notation (for example, 207.134.3.5). At the IP layer, packets are not acknowledged. A checksum is calculated to detect corruption in the IP header, which could cause a packet to become misrouted. Corruption or errors in the IP payload, however, are not detected; a higher layer must perform this function if necessary, and it is usually done at the transport layer. IP uses a single-octet protocol number in the packet header to identify the transport layer protocol that should receive the packet.
IP version 6 (IPv6) [3] was developed by the IETF as a replacement for IPv4. It has been slowly gaining support and is supported now by most operating systems. The biggest initial networks of IPv6 are wireless telephony carriers who need the most important advantage of IPv6 over IPv4—a much enlarged addressing space. IPv6 increases the addressing space from 32 bits in IPv4 to 128 bits, providing for over 4 billion IPv6 addresses. An IPv6 address is typically written as a sequence of eight hexadecimal numbers separated by colons. For example, 0:0:0:0:aaaa:bbbb:cccc:dddd is an IPv6 address written in this format. It is also common to drop sequences of zeros with a single double colon. This same address can then be written as ::aaaa:bbbb:cccc:dddd. SIP can use either IPv4 or IPv6.
IP addresses used over the public Internet are assigned in blocks by regional internet registries (RIR). For example, the American Registry for Internet
Figure 1.1 The Internet multimedia protocol stack.
4 |
SIP: Understanding the Session Initiation Protocol |
Numbers (ARIN) allocates addresses in North America while Réseaux IP Européens Network Coordination Centre (RIP ENCC) allocates addresses in Europe. The Internet Assigned Number Association (IANA) manages the overall IP address pool, delegating blocks to the RIRs. Individual end users and enterprises use IP addresses allocated to them by their Internet Service Provider (ISP) from a regional registry.
As a result of this centralized assignment, IP addresses are globally unique. This enables a packet to be routed across the public Internet using only the destination IP address. Various protocols are used to route packets over an IP network, but they are outside of the scope of this book. Subnetting and other aspects of the structure of IP addresses are also not covered here. There are other excellent sources [4] that cover the entire suite of TCP/IP protocols in more detail.
Private IP addresses are addresses that are not routable on the public Internet but can be routable on a stub network LAN. A router performing network address translation (NAT) is used when a host with a private IP address needs to access resources on the public Internet. NAT temporarily binds or maps a host’s private IP address, which is only routable within the LAN with a public IP address that has been allocated to the NAT. The NAT rewrites IP packets as they pass through in both directions, allowing connections. A detailed description of NAT and how it affects SIP and Internet communications can be found in Chapter 10. There are three IPv4 address blocks which have been allocated for private addresses in [5]:
10.0.0.0 - 10.255.255.255 or 10/8 172.16.0.0 - 172.31.255.255 or 172.16/12 192.168.0.0 - 192.168.255.255 or 192.168/16
Configuration information for Internet Protocol can be manually configured in a host or it can be learned automatically. Typically a host needs to know its own IP address, default gateway, subnet mask, and DNS server addresses. One common protocol for this is Dynamic Host Configuration Protocol (DHCP) which is defined by RFC 2132 [6]. DHCP allows a host to autodiscover all these parameters upon initialization of the IP stack. There are various DHCP extensions which have been defined to autoconfigure other protocols, including SIP.
1.2.4Transport Layer
The next layer shown in Figure 1.1 is the transport layer. It uses a two-octet port number from the application layer to deliver the datagram or segment to the correct application layer protocol at the destination IP address. There are two commonly used transport layer protocols: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). In addition, there are two uncommon trans-
SIP and the Internet |
5 |
port protocols: Stream Control Transmission Protocol (SCTP) and Datagram Congestion Control Protocol (DCCP), which are beginning to be used on the Internet. There is also Transport Layer Security (TLS) which provides security on top of TCP. These protocols are introduced in the following sections.
1.2.4.1 Transmission Control Protocol
Transmission Control Protocol (TCP) [7] provides reliable, connection-oriented transport over IP. A TCP connection between two hosts over an IP network is sometimes known as a socket. TCP is a client/server protocol. Servers “listen” on a specific port number for an incoming request to open a socket. A client sends a request to open a new socket to the server on the well-known port. The combination of the source IP address, source port, destination IP address, and destination port identifies the socket connection. As such, it is possible for two hosts to have multiple TCP connections open between them.
TCP uses sequence numbers and positive acknowledgments to ensure that each block of data, called a segment, has been received. Lost segments are retransmitted until they are successfully received. Figure 1.2 shows the message exchange to establish and tear down a TCP connection. A TCP server listens on a well-known port for a TCP SYN (synchronization) message to open the connection. The SYN message contains the initial sequence number the client will use during the connection. The server responds with an ACK message to acknowledge the SYN with an acknowledgment number, and then follows up with its own SYN message containing its own initial sequence number. Often, these two messages are combined into one SYN-ACK message that does both functions. The client completes the three-way handshake with an ACK or a DATA packet with the AK flag set to the server acknowledging the server’s sequence number. Now that the connection is open, either client or server can send data in DATA packets (segments). The connection is closed when either side sends a FIN packet that receives an ACK. This exchange is shown in Figure 1.2.
TCP sends data in units called segments. The maximum segment size (MSS) is negotiated between the hosts during the handshake, and is usually based on the maximum transmission unit (MTU) of the local network. In general, the larger the segment size the more efficient the transport, except when packet loss is present when smaller segments can result in fewer retransmissions. A typical MTU value for the Internet is 1,500 octets.
TCP uses cumulative acknowledgements for reliability. The recipient sends ACK packets including the next sequence number it expects to receive. If a sender does not receive an ACK within a certain time period, the segment is resent. An example is shown in Figure 1.3.
TCP also has built in flow control. Flow control is used by a receiver to slow down the rate of transmission to allow the receiver to properly process or buffer incoming segments. TCP uses a sliding window for end-to-end control.

6 |
SIP: Understanding the Session Initiation Protocol |
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 1.2 TCP handshake example.
Figure 1.3 TCP reliability example.
Senders can only send the number of octets in the window before waiting for an ACK. A receiver can reduce the size of the window in ACK messages, even setting it to 0 to cause the sender to stop sending. Once the receiver has caught up, another ACK can be sent to increase the window size and resume the flow of segments. This is shown in Figure 1.4.

|
|
|
|
SIP and the Internet |
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 1.4 TCP flow control example.
TCP also has built in congestion control. TCP uses a slow-start algorithm to attempt to avoid congestion. When congestion occurs, TCP uses a fast retransmit and a fast recovery. The details of how these algorithms work can be found in any good TCP/IP reference such as [4].
TCP adds a 20-octet header field to each packet, and is a stream-oriented transport. An application using TCP to send messages must provide its own framing or separation between messages. Error segments are detected by a checksum covering both the TCP header and payload.
1.2.4.2 Transport Port Numbers
Ports numbers are used by the transport layer to multiplex and demultiplex multiple connections on a single host. Otherwise a pair of hosts could only have a single connection between them. Also, messages for different protocols can be separated by using different port numbers. Often these port numbers are associated with a specific protocol. Others are registered to a particular protocol. Ports are a 16 bit integer. Ports in the range 0 to 1024 are called well-known ports. Ports in the range of 1024 through 49151 are known as registered ports. Ports in the range of 49152 through 65535 are known as dynamic, private, or ephemeral ports. For example, Web servers use the well known port of 80, SIP uses the registered ports of 5060 and 5061, while RTP usually uses a dynamic port.
1.2.4.3 User Datagram Protocol
User Datagram Protocol (UDP) [8] provides unreliable transport across the Internet. It is a best-effort delivery service, since there is no acknowledgment of
8 |
SIP: Understanding the Session Initiation Protocol |
sent datagrams. Most of the complexity of TCP is not present, including sequence numbers, acknowledgments, and window sizes. UDP does detect datagrams with errors with a checksum. It is up to higher layer protocols to detect this datagram loss and initiate a retransmission if desired.
UDP does not provide congestion control or flow control—if any of these functions are needed, they must be built into the application layer protocol. UDP is best suited for short, single packet exchanges such as DNS or routing queries. It is also good for real-time, low latency transports protocols such as SIP and RTP.
UDP adds an 8 octet header field to datagrams. Applications and protocols that use UDP must do their own framing—they must break up information into individual UDP packets. For a message oriented protocol, this typically means one message or request per UDP datagram.
1.2.4.4 Transmission Layer Security
Transmission Layer Security (TLS) [9] is based on the Secure Sockets Layer (SSL) protocol first used in Web browsers. TLS uses TCP for transport although it has recently been extended to also run over UDP. TLS is commonly used today on the Internet for secure Web sites using the secure HTTP (https) URI scheme.
The TLS protocol has two layers: the TLS Transport Protocol and the TLS Handshake Protocol. The TLS Transport Protocol is used to provide a reliable and private transport mechanism. Data sent using the TLS Transport Protocol is encrypted so that a third party cannot intercept the data. A third party also cannot modify the transported data without one of the parties discovering this. The TLS Handshake Protocol is used to establish the connection, negotiate the encryption keys used by the TLS Transport Protocol, and provide authentication.
The key agreement scheme selects an encryption algorithm and generates a one-time key based on a secret passed between the two sides. During the handshake, the parties exchange certificates, which can be used for authentication. The cryptographic computations for a TLS connection are not trivial, and the multiple round trips needed to open a connection can add to message latency. Also, certificate verification can introduce processing delays. However, TLS transport has clear security advantages over UDP or TCP. TLS is widely supported due to its use in secure Web browsers and servers. TLS will be discussed more in Chapter 14.
1.2.4.5 Stream Control Transport Protocol
The Stream Control Transmission Protocol (SCTP) [10] is similar to TCP in that it provides reliable stream-based transport. However, it has some advantages over TCP transport for a message-based protocol. First, it has built-in message segmentation, so that individual messages are separated at the transport layer. Another advantage is that SCTP avoids the so-called “head of line blocking”