9.
Internet Protocols
- Internet Architecture
- IP (Internet Protocol)
- TCP (Transmission Control Protocol)
- TCP Features
- User Datagram Protocol (UDP)
- Domain Name System (DNS)
- Top-Level Domains
- DNS Lookup
- Application Layer Protocols
- Internet Electronic mail
- Sending e-mail
- Multipurpose Internet Mail Extensions (MIME)
- MIME Headers
- Universal Resource Identifiers (URIs)
- URIs, URNs and URLs
- URI Syntax
- Escaping Special URI characters
- Uniform Resource Locators (URLs)
- URL schemes
- Uniform Resource Names (URNs)
- Some Namespace Identifiers
- Resolving URNs
- Links to more information
9.1. Internet Architecture
-
network layer concerned with physical connection
between nodes
-
internet layer deals with addressing and fragmentation
of packets (includes ICMP)
-
transport layer provides reliable connections (for
TCP) between processes on hosts
-
application layer contains protocols used by
applications
9.2. IP (Internet Protocol)
- defined in RFC 791 (1981)
- for transmitting blocks of data (packets) from sources to
destinations
- each packet has 32-bit destination address (IP address)
- 4 bytes written as
n1.n2.n3.n4
where each ni is a decimal number, e.g.,
18.23.0.22
- IP address standard is
RFC
1166
- IP fragments packets if necessary (maximum size 65535
bytes)
- each packet routed independently, based on destination and
network load and availability
- no mechanism for reliable delivery
9.3. TCP (Transmission Control Protocol)
- defined in Internet standard RFC 793 (1981)
- provides connection-oriented, reliable service
- supports addressing of individual processes via ports
-
9.4. TCP Features
- acknowledges safe receipt of packets
- detects missing/corrupted/duplicated packets
- provides method to resend missing/corrupted packets
- uses sequence numbers to reassemble packets in same order as
sent
- so looks like stream from application's viewpoint
- used for
ftp, telnet, http,
smtp
9.5. User Datagram Protocol (UDP)
- provides connectionless, unreliable service
- so UDP faster than TCP
- adds only checksum and process-to-process addressing to IP
- used for DNS and NFS
- used when socket is opened in datagram mode
9.6. Domain Name System (DNS)
- DNS defined in RFC 1034
and 1035
- global service mapping DNS names to IP addresses
- names are organised hierarchically:
- 2 advantages:
- easier to remember
www.w3.org than
18.23.0.22
- higher level of abstraction allows simpler reorganisation
9.7. Top-Level Domains
- domains on first level of hierarchy are top-level
domains:
- either country-code top-level domain (ccTLD)
- or generic top-level domain (gTLD)
- ccTLD represented by two-letter country-codes from ISO 3166,
e.g.,
uk, fr, de,
ch
- gTLD given in Internet informational RFC 1591:
-
edu: educational institutions
-
com: commercial entities, i.e., companies
-
net: network providers
-
org: organisations, e.g. NGOs
-
gov: government agencies
-
mil: US military
-
int: organisations established by international
treaties
9.8. DNS Lookup
- DNS server is used by browser to map DNS name to IP address:
9.9. Application Layer Protocols
-
HTTP (HyperText Transfer Protocol)
- used by WWW to deliver documents
- each document has a unique URI (Universal Resource
Identifier)
-
telnet (for remote login)
-
ftp (file transfer protocol)
-
NNTP and NNRP (network news protocols)
-
Network News Transfer Protocol (NNTP) used between
servers
-
Network News Reading Protocol (NNRP), subset of NNTP,
used between client and server
-
SMTP, POP3 and IMAP4 (email protocols)
9.10. Internet Electronic mail
- e-mail client responsible for
- retrieving mail from server (POP3, IMAP4)
- sending mail to server (SMTP)
- e-mail server responsible for
- collecting mail from client (SMTP)
- distributing mail to client (POP3, IMAP4)
- relaying mail between e-mail servers (SMTP)
9.11. Sending e-mail
-
SMTP (Simple Mail Transfer Protocol)
- defined in RFC 821
and 822 (1982)
- use
mailto: prefix in URI in browser
- uses socket port 25
- routing using DNS (Domain Name System)
- address of recipient is of form
octopus@garden.under.the.sea
- also used for mailing lists and list servers
9.12. Multipurpose Internet Mail Extensions (MIME)
- SMTP uses 7-bit ASCII format
- inadequate for non-English and non-textual data
-
MIME defined in RFCs 2045,
2046,
2047,
2048,
2049; allows
- non-US-ASCII message bodies
- extensible set of different formats for non-textual bodies
- multi-part message bodies
- non-US-ASCII textual header information
9.13. MIME Headers
- MIME headers include:
-
MIME-Version
-
Content-Type: specifies type and subtype
-
Content-Transfer-Encoding: specifies auxiliary
encoding
- contents of
Content-Type header is MIME
type
- examples of MIME types are
text/html,
image/gif and multipart/mixed
- example of
Content-Transfer-Encoding is
base64:
- preferred encoding for 8-bit binary data
- each group of 3 bytes (24 bits) is encoded as 4 ASCII
characters
9.14. Universal Resource Identifiers (URIs)
- a Universal Resource Identifier (URI) is a unique
identifier for identifying a resource on the internet
- basic syntax is:
scheme ":" scheme-specific-part
where
9.15. URIs, URNs and URLs
- a URI is either
- a Uniform Resource Name (URN), or
- a Uniform Resource Locator (URL)
- URN names a resource, while URL gives its address
- URN vs URL analogous to DNS name vs IP address
9.16. URI Syntax
- general rules for scheme specific part:
-
percent sign:
% used as escape character
for encoding non-printable or reserved characters
(completeness)
-
hierarchical forms:
/ reserved to delimit
resources which are arranged hierarchically
-
hash sign:
# used to delimit a resource
reference from a fragment identifier
-
query strings:
? used to delimit a
resource reference from a query to the resource, +
used for space in query string
9.17. Escaping Special URI characters
- if need
%, /, #,
? or space in URI or + in query string, use
escaped versions based on ASCII hexadecimal values:
| symbol |
escaped version |
| % |
%25 |
| / |
%2F |
| # |
%23 |
| ? |
%3F |
| space |
%20 |
| + |
%2B |
9.18. Uniform Resource Locators (URLs)
-
scheme is one of
-
ftp, http, https,
mailto, news, nntp,
telnet
-
scheme specific part has syntax
"//" [ user
[ ":" password ] "@" ]
host
[ ":" port ] [ "/"
url-path ]
[ "?" query-string ]
[ "#" anchor ]
where
-
user and
password are optional
-
host is fully qualified
domain name or IP address
-
port is optional
(usually a default)
-
url-path is path to
resource, specific to scheme
-
query-string include
parameters associated with request (usually form fields)
-
anchor is a reference to
a part of a resource
9.19. URL schemes
-
http
-
user name and password not
applicable
- default
port number is 80
-
https
- HTTP over Secure Sockets Layer (SSL)
- default
port number is 443
-
ftp
-
user name and password can be
given
- if not, anonymous ftp used
- default
port number is 21
-
telnet
- host is mandatory
- default
port number is 23
-
mailto
- no need for
url-path to be specified
- program should prompt user for message, then send using SMTP
9.20. Uniform Resource Names (URNs)
- overcome disadvantages of using URLs, namely:
- dependence on host names
- dependence on file structure on host
- ease with which URL can be invalidated
- no syntactic difference between URN and URL
- URNs not yet supported by browsers
- syntax for URNs:
"urn:" <NID> ":" <NSS>
where
-
scheme is urn
-
scheme specific part is <NID> ":"
<NSS>
-
<NID> is Namespace IDentifier
-
<NSS> is Namespace Specific
String
9.21. Some Namespace Identifiers
9.22. Resolving URNs
- need Resolver Discovery Service (RDS) to find resolver
for particular
<NID>
- send
<NSS> to resolver which returns URL of
resource
9.23. Links to more information
A basic overview of the Internet is given in the book by Comer. More detail is given in the book by Wilde. A more recent book that covers this material is the one by Shklar and Rosen. Section 8.1 in [Moller and Schwartzbach] also gives a brief overview.