Day 008 #FromZeroToHacker – How the web works

While, as users, for us, the web is something we just use by typing a website address and that’s that, as hackers (white hat hackers, right?), we need to understand how the web works: From how URLs are formed, HTTP status codes, how cookies are stored, what languages are used, how we interact with databases and more.

Today there is a lot, so let’s start our daily #FromZeroToHacker lesson.

Table of contents
Introduction
What I have learnt today?
Stats
Resources

Introduction to How the web works

There is a lot of content today, so let’s get started without further ado (I just love how that sounds).

HTTP request and response cycle

What I have learnt today?

HTTP in detail

What is HTTP(S)

What is HTTP

HTTP (or HyperText Transfer Protocol) is a set of rules used for communicating with web servers.

What is HTTPS

HTTPS is the secure version of HTTP, where data is encrypted, stopping people from seeing the data you are sending and receiving.

Requests and responses

To access a website, we make a request to a web server for assets (HTML, videos, images…), and we download the responses.

What is a URL?

A URL (Uniform Resource Locator) is an instruction on how to access a resource on the Internet:

URL format


Scheme: What protocol to use for accessing a said resource (HTTP, HTTPS, FTP…).
User: Some services require authentication to log in.
Host: The domain name or IP address of the server to access.
Port: The port that you are going to connect to.
Path: The file name of the location of the resource you are trying to access.
Query string: Extra bits of information. For example, here, the ID is 1.
Fragment: this is a reference to a location on the actual page requested.

Making a request

How the web works: Making a GET request to a website

We can make a request to a web server with just GET / HTTP/1.1, but most of the time we need to send other data in the request as well. This data is called headers:

GET / HTTP/1.1
Host: tryhackme.com
User-Agent: Mozilla/5.0 Firefox/87.0
Referer: https://tryhackme.com/

The code is pretty simple: We are GETting (GET method) the home page, where the web server is tryhackme.com, and we are using the Firefox browser and we are telling also the web server that the website that referred to us is https://tryhackme.com

As a response, we get:

HTTP/1.1 200 OK
Server: nginx/1.15.8
Date: Fri, 09 Apr 2021 13:34:03 GMT
Content-Type: text/html
Content-Length: 98

<html>
<head>
    <title>TryHackMe</title>
</head>
<body>
    Welcome To TryHackMe.com
</body>
</html>

Again, pretty straightforward: We are getting a 200 OK HTTP Status code which tells us everything is alright, information about the server, date, time, and time zone from the web server, the Content-Type the response is sending us, a Content-Length to confirm no data is missing. After all, the HTML code.

HTTP Methods

HTTP methods are a way to show the action when making an HTTP request. While there are a lot of HTTP methods, these are the most used:

GET: Used for getting information from a web server.
POST: Used for submitting data to the web server, and creating new records.
PUT: Used for submitting data to the web server, and updating an existing record.
DELETE: Used for deleting records from a web server.

HTTP Status Codes

When an HTTP server responds, the response contains a status code in the first line, telling how to handle it. There are 5 different ranges:

1XX -> Information response: Tells the client that the request has been accepted and they should continue sending the rest of the request.
2XX -> Success: Tells the client their request has been successful.
3XX -> Redirection: Used to redirect the client’s request to another resource.
4XX -> Client errors: Used to inform the client that there was an error with their request
5XX -> Server errors: Reserved for errors from the server side.

There are loads of HTTP status codes that would be too long to explain here. Luckily, we have a list of HTTP Status codes in Wikipedia.

Headers

As we explained a bit ago, headers are additional bits of data that are sent to the web server when making requests.

Common request headers

Host: Web servers sometimes host multiple websites, so by providing the host headers, we tell which one we need.
User-Agent: Your browser software and version number.
Content-Length: The content length that the browser uses to ensure isn’t missing any data.
Accept-Encoding: Tells the web server what types of compression methods the browser supports.
Cookie: Data sent to the server to help remember your information.

Here is a list of all Request fields

Common response headers

Set-Cookie: Information to store which gets sent back to the web server on each request.
Cache-Control: How long to store the content in the browser’s cache.
Content-Type: Tells the client what type of data is being returned (HTTP, CSS, Javascript, etc).
Content-Encoding: What method has been used to compress the data.

Here is a list of all Response fields

Cookies

Cookies, in an Internet context, are small pieces of data stored in your browser. Every request you make to a website sends its own cookie back to the server. This is used to remind the web server who we are, personal settings, or if we have been here before. Here’s an example of an HTTP request:

Cookies

While can be used for many purposes, cookies are mostly used for website authentication.

Viewing your cookies

You can easily view what cookies your browser is sending to a website by using the developer tools. For example, if I open it on the TryHackMe website, here’s what I can see:

Cookies dev tools

How websites work

How websites work

When you visit a website, your browser makes a request to a web server asking for information about the page. It will respond with the data that your browser uses to show you the information.

![[day_008_how_websites_work.png]]
There are two components that make up a website: Front-End (client side), how your browser renders a website, and Back-End (server side), a server that processes your requests, and returns a response.

HTML

Websites are normally created using three languages: HTML, CSS, and JavaScript.

HTML (HyperText Markup Language) is the language websites are written in. Elements, or tags, are the building blocks of HTML pages, creating the structure of a website. Here is an example:

![[day_008_html.png]]

The HTML structure shown above has the following components:

- <DOCTYPE&nbsp;html> defines that the page is a HTML5 document
- <html> element is the root element of the HTML page
- <head> element contains information about the page (Such as the age title)
- <body> defines the HTML's body, the content that would be displayed in the browser
- <h1> defines a large Heading
- <p> element defines a paragraph

There are more elements, or tags, used for different purposes. Here is a list of HTML5 tags

Tags can contain attributes, such as the class name, location of a file, etc. For example, <p id="header"> is a paragraph with the attribute identification and the value of the header.

JavaScript

JavaScript (JS) is one of the most popular coding languages and allows pages to become interactive. All major web browsers have a dedicated JavaScript engine to execute the code on users’ devices.

While HTML creates the structure, JavaScript gives functionality to the web pages: Without it, we would never see interactive elements and the pages would be always static (and stale!).

With JS, we can update the page in real-time, give functionality to the ‘Send email’ button or change the colour of an element of the website, for example.

The file containing the JavaScript code is loaded in the page source code with the <script> tag:
<script src="/location/of/javascript_file.js"></script>.

Sensitive Data Exposure

Sensitive Data Exposure occurs when a website doesn’t properly protect (or remove) sensitive text information to the end-user, which can be seen on the source code of the client.

By viewing the source code, we can see if a developer has forgotten to remove login credentials, internal links, or important information.

Said sensitive information can be potentially leveraged to let an attacker access different parts of a website or application. For example, there could be comments in the source code containing “temporal” login credentials that haven’t been deleted, granting access to the attacker.

One of the first things that attackers do, is review the page source code and search for mistakes like these.

Sensitive Data Exposure HTML code example

HTML Injection

HTML injection is a vulnerability that occurs when unfiltered user input is displayed in the image. If a website doesn’t sanitise (filter out) user input, an attacker can inject HTML code into a vulnerable website.

Letting an attacker have access to HTML is troublesome, as it can manipulate the HTML, defacing a website, or displaying a user’s username and password.

HTML Injection example

Putting it all together

Putting It all together

We have learned that a lot goes behind the scenes when you request a website: Your computer needs to know the server IP, using DNS. Then, the computer communicates using the HTTP protocol, the webserver returns the HTML, CSS JavaScript (and more), which your browser displays for you nicely formatted:

Webiste request cycle

Other components

But there is more. Way more. Let’s see what components we use while we “surf the web” (sorry!):

Load balancers

When a website’s traffic gets high, running it in just one server no longer does the job. Load balancers ensure that high-traffic websites can handle the traffic: When you request a website, the load balancer forwards your request to one of the multiple servers behind it. It uses different algorithms to help it decide which server is best to deal with your request.

Load balancers


Load balancers also perform periodic checks to ensure the servers are running correctly, a process called health check.

CDN (Content Delivery Networks)

A CDN is an excellent resource for cutting down traffic to a busy website. This allows you to host static files from your website (JS, CSS, videos…) and host them across thousands of servers all over the world, so when the user requests these files, the CDN sends the request to the closest (physically) server instead of, potentially, the other side of the world.

Databases

Websites need a way of storing information. It may be users and their passwords, elements from a shop, etc. Webservers can communicate with databases to store that information.

A database can range from just a text file to complex clusters of multiple servers. Common databases are MySQL, MariaDB, MongoDB, Postgres, GraphQL…

WAF (Web Application Firewall)

A WAF sits between your web request and the web server, protecting the web server from hacking o DDOS (Denial of Service) attacks. It analyses the web requests from common attack techniques; it also checks if an excessive amount of web requests are being sent by using rate limiting or throttle, only allowing a certain number of requests from the same IP per second.

Web Application Firewall

How Web servers work

What is a Web server?

A web server is a software that listens for incoming connections and utilises the HTTP protocol to deliver web content to its clients. The most common web servers are Apache, Nginx, and NodeJS.

A web server delivers files from the root directory.

Virtual hosts

Web servers can host multiple websites with different domain names, using virtual hosts. The server software checks the hostname being requested from the HTTP headers and matches that against its virtual hosts (text-based configuration files). If it finds a match, the website is provided, if not, the default one is provided instead.

Static VS Dynamic content

Static content is content that never changes: Javascript, CSS, etc. These files are directly served from the webserver with no changes.

Dynamic content, on the other hand, is content that may change: For example a blog.

These changes to what you end up seeing are done in the Back-End, using programming and/or scripting languages. The HTML shown is the result of the processing from the Back-End. What you see in your browser is the Front-End.

Scripting and Back-End languages

Some examples of these languages are PHP, Python, Ruby, NodeJS, Perl, and more. These languages interact with databases, call other services, process data from and for the user, and so much much more.

Stats

From 278.000th to 238.382th. Let’s go!

Here is also the Skill Matrix:

Skill matrix

Resources

Path: Pre Security

How the Web Works

TryHackMe: HTTP in detail

Other resources

List of HTTP Status Codes
All Request fields
All Response fields
List of HTML5 tags