HTTP

HTTP is an extensible protocol that relies on concepts like resources and Uniform Resource Identifiers (URIs), simple message structure, and client-server communication flow.

A little bit of history

HyperText Transfer Protocol – HTTP is what puts the Net into the Internet. Without it, we would be probably still using The Encyclopedia Britannica to do our homework. It is the basis for almost all web applications and the method computers and servers use to receive and send information. For example, when you access the site https://woocommerce.com on your browser, it (your browser) sends an HTTP request to the WooCommerce.com servers to access the page’s content. Then, WooCommerce.com servers (which are actually WordPress.com servers because the WooCommerce site is hosted with WordPress.com) send an HTTP response with all the page’s content: Text, images, links and also the formatting that makes it look so nice once it’s displayed on your browser.

The HTTP Protocol was developed between 1989 and 1991 by Tim Berners-lee and his team and it has gone through many changes ever since. Back then, Tim was working at CERN (Conseil Européen pour la Recherche Nucléaire) and he wrote a proposal to build a hypertext system over the internet that was initially called Mesh and later renamed to the World Wide Web (WWW).

It was built over TCP and IP protocols and it consisted of four build blocks:

  • A textual format to represent hypertext documents: the HyperText Markup Language (HTML).
  • A simple protocol to exchange these documents: the HyperText Transfer Protocol (HTTP).
  • A client to display (and edit) these documents, the first web browser called the WorldWideWeb.
  • A server to give access to the document

These four blocks were completed by the end of 1990 and by 1991 the first servers out of CERN were up and running. On August 6, 1991, Tim Berners posted on the public alt.hypertext newsgroup and this is now considered to be the official start of the World Wide Web as a public project.

The first version was extremely simple and it was later called HTTP/0.9 also referred to as the One-Line protocol.

GETting there

When I say the first version was extremely simple I mean it. The requests consisted of one single line and started with only one possible method: GET followed by the path to the resource.

GET /home.html

The response was extremely simple, too: it only consisted of the file itself:

<html>
  A very simple HTML page
</html>

There were no headers, which means only HTML files could be transmitted. There were no statuses or error codes. In case something went wrong, a specific HTML file was generated and included a description of the problem.

HTTP/1.0

Even though HTTP/0.9 was super simple, it evolved fairly quickly and became more versatile. HTTP/1.0 started to send versioning information within each request (HTTP/1.0 was appended to the GET line) a status code line was also sent at the beginning of the response. This way the browser could recognize if a request had failed or succeeded and change the behaviour accordingly. The concept of HTTP headers was added for both requests and responses so Metadata could be transmitted and the protocol became super adaptable and extendable. Documents other than plain HTML files could be transmitted thanks to the Content-Type header.

Then, a usual request would be like:

GET /home.html HTTP/1.0
User-Agent: Netscape/2.0 (Windows 3.1; rv:1.0)

200 OK
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: SERVERNAME/3.0 libwww/2.17
Content-Type: text/html
<HTML>
A page with an image
  <IMG SRC="/image.gif">
</HTML>

YES, I Googled how the user-agent for Netscape looked like XD

Between 1991 and 1995 these changes were introduced based on a trial and error approach. Interoperability was a huge issue since different browsers would try different things. In an attempt to solve these problems, a document describing best and common practices was published in 1996. It was the RFC 1945 and this defined HTTP/1.0

HTTP/1.1

In parallel to all the implementations that were going on with HTTP/1.0, proper standardization was being worked on. The first standardized version of HTTP, HTTP/1.1, was published in early 1997. It clarified several ambiguities and introduced a bunch of new things.

  • A connection could be reused, which saved time. It no longer needed to be opened multiple times to display the resources embedded in the single original document.
  • Pipelining was added. This allowed a second request to be sent before the answer to the first one was fully transmitted. This lowered the latency of the communication.
  • Chunked responses were also supported.
  • Additional cache control mechanisms were introduced.
  • Content negotiation, including language, encoding, and type, was introduced. A client and a server could now agree on which content to exchange.
  • Thanks to the Host header, the ability to host different domains from the same IP address allowed server collocation.

HTTP/1.1 was first published as RFC 2068 in January 1997 and it’s still in use today.

The flexibility of HTTP allowed for the development of additional headers and methods. Despite undergoing two revisions, with the publication of RFC 2616 in June 1999 and RFC 7230-RFC 7235 in June 2014, before the introduction of HTTP/2, the protocol remained remarkably stable for over 15 years.

One of the largest changes to HTTP happened at the end of 1994 when a company called Netscape (tears and nostalgia) created an additional transmission layer that was encrypted on top of everything: the SSL. SSL1.0 never made it to the public but 2.0 and SSL 3.0 made possible the creation of e-Commerce sites. To do this, they encrypted and guaranteed the authenticity of the messages exchanged between the server and the client. SSL was eventually standardized and became TLS.

The year 2000 brought not only the millennium bug but also a new pattern for using HTTP: representational state transfer, or REST. The API was not based on the new HTTP methods,  but instead relied on access to specific URIs with basic HTTP/1.1 methods. This allowed any web application to let an API retrieve and modify its data without having to update the browsers or the servers.

HTTP/2

Years gone by, and like (almost) everything else, things evolved, web pages became more complex with some becoming applications on their own.

The volume of intricate files, visual media and scripts being used increased substantially. A lot more data was being transmitted over HTTP requests which resulted in more overhead for HTTP/1.1 connections.

In an attempt to address this Google implemented an experimental protocol called SPDY (speedy) in the early 2010s which represented an increase in responsiveness and solved the problem of duplicate data transmission. This served as the very foundation for the HTTP/e protocol.

The HTTP/2 protocol differs from HTTP/1.1 in a few ways:

  • It’s a binary protocol rather than a text protocol. It can’t be read and created manually. Despite this hurdle, it allows for the implementation of improved optimization techniques.
  • It’s a multiplexed protocol. Parallel requests can be made over the same connection, removing the constraints of the HTTP/1.x protocol.
  • It compresses headers. As these are often similar among a set of requests, this removes the duplication and overhead of data transmitted.
  • It allows a server to populate data in a client cache through a mechanism called the server push.

HTTP/2 was officially standardized in May 2015 and peaked in January 2022. The early adopters were the very high-traffic sites. HTTP/2 did not require changes to websites to be adopted, which made things easier. Anyone with a recent browser and up-to-date server could use it.

GETtting more technical

If all the things I mentioned above made no sense to you, fear not. I will try to explain the more technical bits. The operative word is TRY.

Let’s go back to the beginning.

We talked about the Client and Server side. So, let’s establish that on the client side, there’s you and you have a computer with a browser installed on it. Let’s say you are using Google Chrome. On the server’s side, there are, well… servers which are basically computers that store tons of information and provide it to the client upon request.

So, when you sit in front of your computer (or stand, no judgment), open Chrome and type Google.com what happens in the background is that your browser will send a Request to a server saying:

Howdy Mr. Server, I have someone asking to access a webpage here. The human address is http://Google.com. Could you kindly send back the information this page contains?

OR, in a more technical way:

GET /xxx HTTP/1.1

Host: google.com

User-Agent: Chrome/5.0 (Windows 10)

Accept-Language: en-US

Accept-Encoding: gzip, deflate

Connection: keep-alive</code.

The server will send a Response (or not if something goes wrong) with different sub-documents fetched, for instance, text, layout description, images, videos, scripts, and more which is then reconstructed in your browser and displayed, hopefully, correctly to you.

Clients and servers communicate by exchanging individual messages as opposed to a stream of data.

HTTP is a protocol, a method to accomplish this message exchange. It is done via TCP, or Transmission Control Protocol which is a network protocol that connects two hosts so they can exchange data. It can also be done over a TLS, which is an encrypted (secured) TCP connection.

Client – The user-agent

The user-agent is any tool that acts on behalf of the user and, as mentioned before, is usually a role performed by a web browser.

The user agent is always what initiates the communication. This is never initiated by the server, even though it is possible through some mechanisms.

To show a web page, the browser first asks for the main HTML document that represents the page. It then examines this file and asks for any additional things needed to display the page correctly via additional requests, like the design information (CSS), pictures, videos, etc. The browser puts all these pieces together to create the full web page. If there are any other instructions in the code, like scripts, the browser follows those and may fetch more things as needed. This process continues until the web page is completely loaded and displayed.

A web page is a hypertext document which is like a special text document on the internet. It has some parts that work like magic doors (links). When you click on these parts (usually with your mouse), they take you to new web pages, letting you explore the internet. The web browser you use turns these clicks into new HTTP requests for information, and it reads the answers to present the user with a clear response.

Servers

On the other side of the communication is the server (or servers) which will provide the client with a response. A server is not necessarily a single machine, although it does appear like that virtually. Physically it can be a collection of servers sharing the load.

Depending on the size and requests, one physical machine can host several server software instances.

Proxies

Oh, yeah, sorry. There’s a third character in this drama. The Proxies. A proxy server is like a middleman between your computer or device and the internet. It acts as an intermediary that helps you make requests to websites or other internet services. Instead of connecting directly to a website, your device connects to the proxy server, and the proxy server then connects to the website on your behalf. They can be transparent, which means they will carry the message without altering it or non-transparent, in which case they will alter the request in some way before forwarding it to the server.

Proxies may perform numerous functions:

  • caching (the cache can be public or private, like the browser cache)
  • filtering (like an antivirus scan or parental controls)
  • load balancing (to allow multiple servers to serve different requests)
  • authentication (to control access to different resources)
  • logging (allowing the storage of historical information)

HTTP Connections

Before a client and a server can exchange a request/response, a connection must be established and a TCP connection is used.

The default behaviour of HTTP/1.0 is to establish a separate TCP connection for each HTTP request/response pair. This approach is not very efficient, especially when multiple requests are rapidly being sent one after another.

To address this issue, HTTP/1.1 introduced pipelining (which proved to be challenging to implement) and persistent connections. With persistent connections, multiple requests can be sent over the same TCP connection, enhancing efficiency. But HTTP/2 took it even further by introducing message multiplexing, allowing for the simultaneous transmission of multiple messages over a single connection. This optimization helps maintain a warm connection and improves overall efficiency.

Ongoing experiments are being conducted to develop a better transport protocol specifically designed for HTTP. For instance, Google is currently experimenting with QUIC, which leverages UDP (User Datagram Protocol) to provide a more efficient and fast transport protocol.

When compared to other networking protocols, UDP is quite straightforward. It involves identifying a target computer and sending data packets called "datagrams" to it. Notably, UDP doesn't establish any specific order for packet arrival, and it does not check whether the datagrams actually reach their destination. 

Although UDP includes checksums to verify data integrity and port numbers to distinguish the data's role at both ends, the absence of a mandatory initial connection can be problematic. This leaves the program using UDP vulnerable to the unreliable aspects of the underlying network.

Consequently, the data sent via UDP may or may not be delivered, and its arrival order isn't controlled as it is in TCP. This can result in glitchy, out-of-order, or missing data at the final destination.

However, in situations where error checking and data correction aren't necessary, this lack of strict control may not be a significant issue. This explains why UDP is preferred in applications like video streaming, gaming, etc, where delivering the signal on time is prioritized over occasional glitches.

Under Control

Because HTTP is so flexible and extendable, over time it allowed for more and more control and functionality.

Here are a few things HTTP requests can control:

  • Caching – HTTP can instruct proxies and clients about what to cache and for how long.
  • Authentication
  • Reverse-Proxy and Tunneling – Sometimes, computers inside a private network (like a company’s intranet) don’t want to reveal their real internet addresses to the outside world. When they need to communicate with the internet, they use special helpers called “proxies.” HTTP requests then go through proxies to cross this network barrier. These proxies act as middlemen. Now, not all proxies are the same. Take the SOCKS protocol for example, which works at a more basic level and can help with different types of internet tasks, not just web browsing. For example, they can assist with things like downloading files. In a nutshell, think of proxies as helpers that allow computers inside a private network to talk to the internet without showing their real addresses, and they can assist with various types of online activities.
  • Sessions – Imagine you’re visiting a website, and you’re doing different things like adding items to an awesome WooCommerce shopping cart or customizing how your amazing WordPress.com website looks. Normally, the internet works in a way that it doesn’t remember what you did once you leave a web page. It’s like the website forgets who you are. But, there’s a clever trick called “HTTP cookies” that websites use. It’s like a tiny, temporary note that your computer keeps for that website. With these cookies, the website can remember what you did, almost like remembering your preferences or what you put in your shopping cart, even when you move from one page to another. So, these “sessions” are like a way for websites to create a memory of what you’ve been doing, even though the basic way the internet works isn’t usually remembering. It’s handy not just for shopping online but also for any website where you can change how it looks or works according to your liking. Cookies help websites remember your choices and actions, making your experience smoother and more personalized.

HTTP -based APIs

Let’s start with what is an API?

An API, or Application Programming Interface, is like a bridge that allows different software programs or systems to communicate and interact with each other. It defines a set of rules and protocols that determine how requests for certain actions or information should be made and how the responses will be structured.

Imagine an API as a menu in a restaurant. The menu provides a list of dishes you can order, along with a description of each dish. When you specify what you’d like to order, the kitchen (i.e., the system) prepares the dish and serves it to you. In this analogy:

  1. Menu: The API provides a list of available functions or actions that a developer can request from a software system.
  2. Order: A developer makes a request to the API, specifying what they want to do or the data they want to retrieve.
  3. Kitchen: The system or software that implements the API processes the request and performs the necessary actions.
  4. Serving: The system sends back a response to the developer with the requested data or the result of the action.

In a more realistic way, let’s say you would like to build a mobile app to display the current weather. Or a website, it doesn’t matter. The point is you will not be on the window looking at how the weather looks like and updating the app/site, right? Plus, you can’t possibly know the weather everywhere around the globe. So you need to retrieve this information from somewhere.

In this case, you can use a weather service API to pull the information based on the location of your phone/computer or based on a search query. The API will tell you how to structure the request, and which information it can provide so you can display it on your application

APIs are crucial for enabling different software applications, services, or systems to work together seamlessly. They are used in a wide range of scenarios, from web development (allowing websites to interact with databases or external services) to connecting different software components within a single application. APIs are the building blocks of the modern digital world, facilitating data sharing and functionality across various platforms and services.

That said, back to the HTTP-based APIs:

I think by now we have established that HTTP is the foundation of communication on the Web. HTTP-based APIs use different HTTP methods to perform various actions. The most common methods include:

  • GET: Retrieve data from the server.
  • POST: Send data to the server to create or update resources.
  • PUT: Update a specific resource on the server.
  • DELETE: Remove a specific resource from the server.

APIs use URLs to identify and locate resources on the server. The URL typically includes the endpoint (the specific resource) and may include query parameters for filtering or specifying additional information.

For example, in our Woo Mobile app, when we want to display the orders a user might have in their store, we use the List all orders endpoint GET /wp-json/wc/v3/orders which would retrieve a list of the orders for that store from the server and display it in the app.

When a client (in the above case the Woo App, not a browser) makes an HTTP request to an API endpoint, it includes information like the HTTP method, headers (metadata), and often data in the request body. The server processes the request and sends an HTTP response back to the client, which includes a status code (indicating the result of the request) and a response body containing the data. Data exchanged between clients and servers in HTTP-based APIs is often formatted as JSON (JavaScript Object Notation) or XML (Extensible Markup Language). These formats are lightweight, easy to parse, and human-readable.

But there’s more to it. To secure access to APIs, authentication (proving your identity) and authorization (determining what actions you’re allowed to perform) mechanisms are commonly used. This can involve API keys, OAuth tokens, or other authentication methods. Many APIs impose rate limits to control how frequently a client can make requests. Rate limiting helps prevent abuse and ensures fair usage of the API.

So, these are tools and tricks that websites and servers use to talk to each other and send information back and forth. It’s like having a conversation but in the language of the internet. They also provide (usually) clear error messages and status codes in responses to inform clients about issues or failures in the request.

In conclusion, HTTP has been the backbone of the Internet, enabling seamless communication between clients and servers. From its humble beginnings as a simple protocol to its evolution into the versatile and extensible HTTP/2, this protocol has played a crucial role in delivering web content and powering modern web applications. With each version and improvement, HTTP has adapted to the growing demands of the digital landscape, facilitating faster and more efficient data transmission. As we continue to navigate the ever-changing online world, we can appreciate the impact and significance of HTTP in shaping the way we access and interact with information on the web.

Response

  1. […] the previous post, I talked a lot little bit about the history of the HTTP protocol and how it works. Let’s […]

Discover more from { Code Journey; }

Subscribe now to keep reading and get access to the full archive.

Continue reading