Session
Initiation Protocol (SIP)
The Session Initiation
Protocol (SIP) is a signalling protocol, widely used for setting up and
tearing down multimedia communication sessions such as voice and video calls
over the Internet. Other feasible application examples include video conferencing,
streaming multimedia distribution, instant messaging, presence information
and online games. The protocol can be used for creating, modifying and terminating
two-party (unicast) or multiparty (multicast) sessions consisting of one
or several media streams. The modification can involve changing addresses
or ports, inviting more participants, adding or deleting media streams,
etc.
SIP was originally designed by Henning Schulzrinne (Columbia University)
and Mark Handley (UCL) starting in 1996. The latest version of the specification
is RFC 3261[1] from the IETF SIP Working Group.[2] In November 2000, SIP
was accepted as a 3GPP signaling protocol and permanent element of the IMS
architecture for IP-based streaming multimedia services in cellular systems.
The SIP protocol is situated at the session layer in the OSI model, and
at the application layer in the TCP/IP model. SIP is designed to be independent
of the underlying transport layer; it can run on TCP, UDP, or SCTP. SIP
has the following characteristics:
Transport-independent, because SIP can be used with UDP, TCP, SCTP, etc.
Text-based, allowing for humans to read and analyze SIP messages.
Protocol design
SIP clients typically use TCP or UDP (typically on port 5060 and/or 5061)
to connect to SIP servers and other SIP endpoints. SIP is primarily used
in setting up and tearing down voice or video calls. However, it can be
used in any application where session initiation is a requirement. These
include Event Subscription and Notification, Terminal mobility and so on.
There are a large number of SIP-related RFCs that define behavior for such
applications. All voice/video communications are done over separate session
protocols, typically RTP.
A motivating goal for SIP was to provide a signaling and call setup protocol
for IP-based communications that can support a superset of the call processing
functions and features present in the public switched telephone network
(PSTN). SIP by itself does not define these features; rather, its focus
is call-setup and signaling. However, it was designed to enable the construction
of functionalities of network elements designated Proxy Servers and User
Agents. These are features that permit familiar telephone-like operations:
dialing a number, causing a phone to ring, hearing ringback tones or a busy
signal. Implementation and terminology are different in the SIP world but
to the end-user, the behavior is similar.
SIP-enabled telephony networks can also implement many of the more advanced
call processing features present in Signaling System 7 (SS7), though the
two protocols themselves are very different. SS7 is a centralized protocol,
characterized by a complex central network architecture and dumb endpoints
(traditional telephone handsets). SIP is a peer-to-peer protocol, thus it
requires only a simple (and thus scalable) core network with intelligence
distributed to the network edge, embedded in endpoints (terminating devices
built in either hardware or software). SIP features are implemented in the
communicating endpoints (i.e. at the edge of the network) contrary to traditional
SS7 features, which are implemented in the network.
Although several other VoIP signaling protocols exist, SIP is distinguished
by its proponents for having roots in the IP community rather than the telecom
industry. SIP has been standardized and governed primarily by the IETF while
the H.323 VoIP protocol has been traditionally more associated with the
ITU. However, the two organizations have endorsed both protocols in some
fashion.
SIP works in concert with several other protocols and is only involved in
the signaling portion of a communication session. SIP is a carrier for the
Session Description Protocol (SDP), which describes the media content of
the session, e.g. what IP ports to use, the codec being used etc. In typical
use, SIP "sessions" are simply packet streams of the Real-time Transport
Protocol (RTP). RTP is the carrier for the actual voice or video content
itself.
The first proposed standard version (SIP 2.0) was defined in RFC 2543. The
protocol was further clarified in RFC 3261, although many implementations
are still using interim draft versions. Note that the version number remains
2.0.
SIP is similar to HTTP and shares some of its design principles: It is human
readable and request-response structured. SIP shares many HTTP status codes,
including the familiar ’404 not found’. SIP proponents also
claim it to be simpler than H.323. However, some would counter that while
SIP originally had a goal of simplicity, in its current state it has become
as complex as H.323. Others would argue that SIP is a stateless protocol,
hence making it possible to easily implement failover and other features
that are difficult in stateful protocols like H.323. SIP and H.323 are not
limited to voice communication and can mediate any kind of communication
session e.g. voice, video and yet to be defined future formats
SIP network elements
SIP User Agents (UAs) are the end-user devices, used to create
and manage a SIP session. A SIP UA has two main components, the User Agent
Client (UAC) sends messages and answers with SIP responses, the User Agent
Server (UAS) responds to SIP requests sent by the peer. SIP UAs may work
in point to point mode. Typical implementations of a UA are SIP softphones,
SIP hardphones and SIP-enabled ATAs.
SIP also defines server network elements. Although two SIP endpoints can
communicate without any intervening SIP infrastructure, which is why the
protocol is described as peer-to-peer, this approach is impractical for
a public service. There are various implementations that can act as SIP
servers:
RFC 3261 defines these server elements:
" Proxy, Proxy Server: An intermediary entity that acts as both a server
and a client for the purpose of making requests on behalf of other clients.
A proxy server primarily plays the role of routing, which means its job
is to ensure that a request is sent to another entity "closer" to the targeted
user. Proxies are also useful for enforcing policy (for example, making
sure a user is allowed to make a call). A proxy interprets, and, if necessary,
rewrites specific parts of a request message before forwarding it."
"A registrar is a server that accepts REGISTER requests and places the information
it receives in those requests into the location service for the domain it
handles."
"A redirect server is a user agent server that generates 3xx responses to
requests it receives, directing the client to contact an alternate set of
URIs.The redirect server allows SIP Proxy Servers to direct SIP session
invitations to external domains."
The same RFC specifies: "It is an important concept that the distinction
between types of SIP servers is logical, not physical."
Other SIP related network elements are
Session border controllers (SBC), they serve as "man in the middle" between
UA and SIP server, see the article SBC for a detailed description.
Various types of gateways at the edge between a SIP network and other networks
(as a phone network)
Instant messaging (IM) and presence
The Session Initiation Protocol for Instant Messaging and Presence
Leveraging Extensions(SIMPLE) is the SIP-based suite of standards for instant
messaging and presence information. Some efforts have been made to integrate
SIP-based VoIP with the XMPP specification used by Jabber. Most notably
Google Talk, which extends XMPP to support voice, plans to integrate SIP.
Google’s XMPP extension is called Jingle and, like SIP, it acts as
a Session Description Protocol carrier.
Conformance testing
TTCN-3 test specification language is used for the purposes of specifying
conformance tests for SIP implementations. SIP test suite is developed by
a Specialist Task Force at ETSI (STF 196).[3]
Commercial applications
Firewalls typically block media packet types such as UDP, though
one way around this is to use TCP tunneling and relays for media in order
to provide NAT and firewall traversal. One solution involves tunneling the
media packets within TCP or HTTP packets to a relay. This solution uses
additional functionality in conjunction with SIP, and packages the media
packets into a TCP stream which is then sent to the relay. The relay then
extracts the packets and sends them on to the other endpoint. If the other
endpoint is behind a symmetrical NAT, or corporate firewall that does not
allow VoIP traffic, the relay would transfer the packets to another tunnel.
One disadvantage of this approach is that TCP was not designed for real
time traffic such as voice, so an optimized form of the protocol is sometimes
used.
As envisioned by its originators, SIP’s peer-to-peer nature does not
enable network-provided services. For example, the network can not easily
support legal interception of calls (referred to in the United States by
the law governing wiretaps, CALEA). Emergency calls (calls to E911 in the
USA) are difficult to route. It is difficult to identify the proper Public
Service Answering Point, PSAP because of the inherent mobility of IP end
points and the lack of any network location capability.
Many VoIP phone companies allow customers to bring their own SIP devices,
as SIP-capable telephone sets, or softphones. The new market for consumer
SIP devices continues to expand.
The free software community started to provide more and more of the SIP
technology required to build both end points as well as proxy and registrar
servers leading to a commodification of the technology, which accelerates
global adoption. SIPfoundry has made available and actively develops a variety
of SIP stacks, client applications and SDKs, in addition to entire IP PBX
solutions that compete in the market against mostly proprietary IP PBX implementations
from established vendors.
The National Institute of Standards and Technology (NIST), Advanced Networking
Technologies Division provides a public domain implementation of the JAVA
Standard for SIP JAIN-SIP which serves as a reference implementation for
the standard. The stack can work in proxy server or user agent scenarios
and has been used in numerous commercial and research projects. It supports
RFC 3261 in full and a number of extension RFCs including RFC 3265 (Subscribe
/ Notify) and RFC 3262 (Provisional Reliable Responses) etc.