So, what really happens when you type a URL into your browser's address bar?

This is a pretty popular question on the interviews, which I have both asked and answered, so I decided to write a short (at least I hope so) article to describe as much as possible.

We will omit everything related to the keyboard, operating system, browser autocomplete and preload functions and go straight to the networking part of this question.

Browser parses the URL

First, browser tries to understand whether entered text is a URL or a search term. If it is the latter, browser will just use the default search engine and send the request as soon as you hit the enter button. Mechanism of this request is described in the next parts of this article, as well as the case when entered text is a valid URL. In case of non-ascii characters in the URL, browser will encode it using Punycode.

DNS lookup

Browsers have their own internal DNS cache, so browsers check it first (see chrome://net-internals/#dns in Chrome).

Then, browser calls gethostbyname OS function to do the lookup. This function does couple of things in the following order:

Check local DNS cache
Check local /etc/hosts/ file
Make a request to the DNS server configured in the network stack (local router or ISP caching DNS server). If the DNS server is on the same subnet the network library follows the ARP process described below for the DNS server, otherwise it follows the same process for the default gateway IP.

ARP

The Address Resolution Protocol (ARP) is a communication protocol used for discovering the link layer address, such as a MAC address, associated with a given IP addresses. It is communicated within the boundaries of a single network, never routed across internetworking nodes. So, basically you got the IP address on the previous step and now you want to know to what computer (identified by MAC address) to send this data.

The ARP cache is first checked for an ARP entry for our target IP. If it is in the cache, the library function returns the result: Target IP = MAC. Otherwise, it checks if the target IP is on any subnets of the local route table. If not, the network library will use the interface that has the default gateway. Then a Layer2 (Data Link Layer) ARP request will be send:

Sender MAC: sender:interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here

Then there is a process of finding a router depending on your network devices (we will omit that part for simplicity, if you want to know more, just understand the difference between hub, switch, router). When the request reaches a router, it will respond with the following response:

Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: sender:interface:mac:address:here
Target IP: interface.ip.goes.here

Back to the DNS

Now we know where to send our DNS request, we can continue:

The DNS client establishes a socket to UDP port 53 on the DNS server, using a source port above 1023.
If the response size is too large, TCP will be used instead.
If the local/ISP DNS server does not have it, then a recursive search is requested and that flows up the list of DNS servers until the SOA is reached, and if found an answer is returned.

Btw, search is recursive because this kind of server is called recursive DNS server. There are many thousands of recursive DNS servers in the world. On the other hand there are authoritative DNS nameserver. Basically, they provide recursive DNS server with the answers.

Now we have everything we need to request the destination server and we can proceed to the sockets part of this article.

Opening a socket

To open a socket, we need an IP address and a port. We can get both from the URL. The HTTP uses port 80 by default, and 443 for HTTPS (we will discuss it in the TLS section later). We also need a source port for socket, but this will be given by OS dynamic port range (ip_local_port_range in Linux).

There is a special system library called socket, and to use TCP we will use 2 constants: AF_INET/AF_INET6 and SOCK_STREAM.

Then we add destination port to the packet header on the Transport Layer (TCP), then we add IP to the packet header on the Network Layer (IP), then on the Data Link layer we wrap this data in a frame and add frame header with both source network interface MAC address and the MAC address of the local router (We get MAC address using ARP described in previous section).

Let's look at the TCP and IP packets format:

TCP Packet

IP Packet

IP

When the packet reaches the router which manages local subnet, it will continue to travel to the AS (autonomous systems) border routers and then finally to the destination subnet and server. Each router will extract destination IP address from the IP header of the packet and will route it to the next hop.

There is also TTL field in the IP header, which is a bit confusing as it is used to define maximum number of hops between source and destination. When a packet reaches a router, the value of this header will be decremented. If the value reaches zero - this packet will be dropped (also it could be dropped because of the network congestion).

TCP

Let's look a bit closer to the TCP connection flow:

Client generates an ISN (initial sequence number) and sends it with the SYN command to the server
Server receives SYN , chooses it's own ISN, and sets client's ISN+1 to the ACK field and ACK flag.
Then client increases it's own SEQ , increases ACK number received from the server and sets ACK flag.

So, when we send data over TCP, we increase SEQ by the number of bytes sent, and other side sends us the same number in ACK packet. When we want to close the connection we send FIN packet, other sides sends us ACK packet and its own FIN packet. Then we ACK this FIN packet. Connection is closed.

Couple of words about TCP congestion control. Network congestion may occur when a sender overflows the network with too many packets. At the time of congestion, the network cannot handle this traffic properly, which results in a degraded quality of service (QoS). The typical symptoms of a congestion are: excessive packet delay, packet loss and retransmission.

TCP uses a technique called slow start. It is designed to gradually expand the amount of data traversing the wire each round trip. The initial packet size is 16kb and doubles on subsequent round trips until a max size is reached. This can vary, but tends to be around 4MB for most connections.

This process is used because the server does not know how much bandwidth the client can handle. Rather than overflowing the client the server uses a gentle size and continues to increase until a limit is found.

<Here I need to explain congestion window, cubic algorhitm>

TLS (1.2 and 1.3)

TLS which is the successor of SSL is a protocol that provides a secure mechanism for authentication using x509 certificates. It also provides a two-way encrypted channel between two parties. Two way encryption means that TLS client can encrypt data to unreadable form and upon receiving, TLS server can decrypt it back to readable form such that any third party entity cannot eavesdrop on the conversation.

It is normally implemented on top of TCP and each set is sent in a separate TCP segment in order to encrypt Application Layer protocols such as HTTP.

SSL handshake is a conversation between client and server, whose goal is to achieve secure connection using symmetric encryption.

Handshake process:

The client computer sends a ClientHello message to the server with its Transport Layer Security (TLS) version, list of cipher algorithms and compression methods available.
The server replies with a ServerHello message to the client with the TLS version, selected cipher, selected compression methods and the server's public certificate signed by a CA (Certificate Authority). The certificate contains a public key that will be used by the client to encrypt the rest of the handshake until a symmetric key can be agreed upon.
The client verifies the server digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this with the server's public key. These random bytes can be used to determine the symmetric key.
The server decrypts the random bytes using its private key and uses these bytes to generate its own copy of the symmetric master key.
The client sends a Finished message to the server, encrypting a hash of the transmission up to this point with the symmetric key.
The server generates its own hash, and then decrypts the client-sent hash to verify that it matches. If it does, it sends its own Finished message to the client, also encrypted with the symmetric key.
From now on the TLS session transmits the application (HTTP) data encrypted with the agreed symmetric key.

TLS 1.2/1.3 Handshake:

TLS 1.2 and 1.3

Also, In TLS 1.3, many ciphers and algorithms have been removed, which are practically as well as theoretically vulnerable. For example, RSA Key Exchange, RC4 Stream Cipher, CBC (Block) Mode Ciphers, SHA-1 Hash Function, Various non-ephemeral Diffie-Hellman groups, MD5 Algorithm, DES, 3DES, and EXPORT-strength ciphers.

Some other differences among TLS 1.2 and TLS 1.3 are:

Support for outdated ciphers and algorithms eliminated.
RSA key exchange got eliminated, and Perfect Key Forward Secrecy became mandatory.
Reduces the total number of handshakes.
AEAD bulk encryption is mandated while eliminating block mode ciphers.
Key derivation, as well as HKDF cryptographic extraction, is reduced.
Zero Round Trip Resumption and 1-RTT is offered.
Support for additional elliptic curves.

<HTTP>