Introduction to Internet Applications

Peter Wood

Representation and Transfer

Web Protocols

Some Other Application Layer Protocols

Uniform Resource Identifiers (URIs)

Uniform Resource Locators (URLs)

  • scheme examples include
    • ftp, http, https, mailto, telnet
  • in the following syntax [ ... ] denotes optional
  • everything else not in quotes denotes a string to be supplied
  • scheme specific part has syntax
    "//" [ user [ ":" password ] "@" ] host [ ":" port ] [ "/" url-path ] [ "?" query-string ] [ "#" anchor ]
    where
    • user and password are not often used
    • host is a fully qualified domain name or IP address
    • port is optional (usually a default)
    • url-path is the path to the resource, specific to scheme
    • query-string includes parameters associated with the request (usually form fields)
    • anchor is a reference to a part of a resource (a fragment identifier)

URL example

In http://www.dcs.bbk.ac.uk/staff/staffperson.php?name=ptw

  • http is the scheme
  • www.dcs.bbk.ac.uk is the host
  • staff/staffperson.php is the url-path
  • name=ptw is the query string

URL schemes

  • http
    • user name and password usually not applicable
    • default port number is 80
  • https
    • HTTP over Secure Sockets Layer (SSL)
    • default port number is 443
  • ftp
    • user name and password can be given
    • if not, anonymous ftp used
    • default port number is 21
  • telnet
    • host is mandatory
    • default port number is 23
  • mailto
    • no need for url-path to be specified
    • program should prompt user for message, then send using SMTP

Escaping Special URI characters

  • the space character is not allowed in URIs
  • the characters /, #, ?, e.g., have special meaning in URIs
  • also & is used to separate parameters in a query string
  • so if we need any of these as an ordinary character in a URI, we use the escaped version
  • the escaped version is the character % followed by the ASCII hexadecimal value of the character
  • now % has a special meaning too
  • the escaped versions of the above special characters are as follows:
    symbol escaped version
    % %25
    / %2F
    # %23
    ? %3F
    space %20
    & %26

Domain Name System (DNS)

  • provides a service mapping (human-readable) DNS names to IP addresses
  • browsers, mail software and most other Internet applications use DNS
  • although the TCP/IP protocols themselves use only IP addresses
  • DNS has two advantages:
    • easier to remember www.w3.org than 128.30.52.37
    • higher level of abstraction allows simpler reorganisation
  • names are organised hierarchically:
    • most significant part of the name on the right (specified by DNS)
    • left-most segment of a name is the name of an individual computer
  • DNS is essentially
    • a distributed database implemented as a hierarchy of DNS servers
    • an application-layer protocol allowing hosts to query the database

Name Resolution

  • translation of a domain name into an address is called name resolution
  • the name is said to be resolved to an address
  • software to perform the translation is known as a name resolver (or simply resolver)
  • this software is usually built in to the application
  • a resolver uses the DNS protocol to contact a DNS server on port 53
  • e.g., browser uses a DNS server to map DNS name to IP address as follows:

DNS Design

  • why is DNS distributed?
  • a simpler design would have been to have one DNS server storing all the mappings
  • problems with this centralised design include:
    • it is a single point of failure
    • the need to handle huge volumes of queries
    • a single server cannot be "close" to all clients
    • it would also have to handle all updates for new hosts

Top-Level Domains

  • right-most domains of the hierarchy are top-level domains:
    • either country-code top-level domain (ccTLD)
    • or generic top-level domain (gTLD)
  • ccTLD represented by two-letter country-codes from ISO 3166, e.g., uk, fr, de, ch
  • gTLD given in RFC 1591; some examples:
    • edu: educational institutions
    • com: commercial entities, i.e., companies
    • net: network providers
    • org: organisations, e.g. NGOs
    • gov: government agencies
    • mil: US military
    • int: organisations established by international treaties

DNS Server Hierarchy

DNS server hierarchy

  • the above figure shows a portion of the hierarchy of DNS servers
  • there are 13 root DNS servers (each is actually a cluster of replicated servers)
    • these return IP addresses of top-level domain servers
  • top-level domain servers are responsible for top-level domains
    • they return IP addresses of authoritative servers for organisations
  • each organisation must provide an authoritative DNS server for its publically accessible hosts

DNS Server Model (1)

  • each organisation is free to choose how to organise its servers
    • a small organisation might use an ISP to run a DNS server
    • a larger organisation might place all names on a single server
    • a large organisation might divide its names among several servers

DNS Server Model (2)

  • DNS allows each organisation to
    • assign names to computers, or
    • change those names
    without informing a central authority
  • each DNS server contains information linking it to other DNS servers up and down the hierarchy
  • a given server can be replicated
  • replication is useful for heavily used servers, such as root servers

DNS Caching

  • DNS servers employ caching in order to improve performance and reduce load
  • mappings between names and addresses can be cached
  • the length of time a mapping stays in the cache is given by its time to live (TTL)
  • a mapping coming from the authoritative DNS server for a name is called an authoritative answer
  • a mapping coming from the cache of some DNS server is called a non-authoritative answer
  • e.g., one can use nslookup on Windows/Unix-based systems
    nslookup www.dcs.bbk.ac.uk
    ...
    Non-authoritative answer:
    Name:	 www.dcs.bbk.ac.uk
    Address: 193.61.29.21
    

Full Name Resolution

  • when nothing is cached, the local name server might have to perform full name resolution

DNS full name resolution

Internet e-mail

  • e-mail client responsible for
    • retrieving mail from server (POP3, IMAP4)
    • sending mail to server (SMTP)
  • e-mail server responsible for
    • collecting mail from client (SMTP)
    • distributing mail to client (POP3, IMAP4)
    • relaying mail between e-mail servers (SMTP)

Sending e-mail

  • SMTP (Simple Mail Transfer Protocol)
  • defined in RFC 821 and 822 (1982), superceded by RFC 2822 (2001)
  • use mailto: prefix in URI in browser
  • uses TCP port 25
  • address of recipient is of the form
    name@dept.inst.ac.uk
  • uses DNS (Domain Name System) to map domain name to IP address

Example SMTP Session

  • mail message is transferred from user John_Q_Smith on computer example.edu to two users on computer somewhere.com

Email Representation Standards

  • two important standards exist
    • RFC (Request For Comments) 2822 mail message format
    • Multi-purpose Internet Mail Extensions (MIME)
  • RFC 2822 format comprises
    • a header section
    • a blank line
    • and a body
  • header lines each have the form
    keyword: information
    
    where keywords include From, To, Subject, Cc
  • the mail message (including headers) makes up the DATA as sent by SMTP

Multi-purpose Internet Mail Extensions (MIME)

  • SMTP originally only used the 7-bit ASCII format
  • inadequate for non-English and non-textual data
  • MIME was defined in RFCs 2045, 2046, 2047, 2048, 2049; allows
    • non-ASCII message bodies
    • extensible set of different formats for non-textual bodies
    • multi-part message bodies
    • non-ASCII textual header information

MIME Headers

  • MIME headers include:
    • MIME-Version
    • Content-Type: specifies a type and subtype
    • Content-Transfer-Encoding: specifies auxiliary encoding for transfer
  • contents of the Content-Type header is the MIME type
  • examples of MIME types are text/html, image/gif and multipart/mixed
  • example of Content-Transfer-Encoding is base64:
    • preferred encoding for 8-bit binary data
    • each group of 3 bytes (24 bits) is encoded as 4 ASCII characters

Base64 Encoding

0x00 0x10 0x20 0x30
0 A Q g w
1 B R h x
2 C S i y
3 D T j z
4 E U k 0
5 F V l 1
6 G W m 2
7 H X n 3
8 I Y o 4
9 J Z p 5
A K a q 6
B L b r 7
C M c s 8
D N d t 9
E O e u +
F P f v /

  • table on left is used in base64 encoding
  • values in top row and leftmost column are hexadecimal numbers
  • range of values is 0x00 to 0x3F (111111), i.e., 0 to 63
  • example: encode 01011010, 10001010, 00011101 as follows
    1. splitting into 4 6-bit values:
      010110, 101000, 101000, 011101
    2. converting to hex: 0x16, 0x28, 0x28, 0x1D
    3. use table to encode: W, o, o, d

Links to more information

Chapter 4 of [Comer] and Chapter 2 of [Kurose and Ross].