Skip to main content

Command Palette

Search for a command to run...

How a Browser Works

A Beginner-Friendly Guide to Browser Internals

Published
6 min read
How a Browser Works
R
I’m a BCA student and a full-stack developer passionate about building scalable web applications. I enjoy working with modern technologies and sharing my learning through blogs.

What a browser actually is (beyond “it opens websites”)

Beyond just opening websites, a web browser is a sophisticated application software designed to act as an intermediary between the user and the internet, rendering complex, raw code into interactive, visual experiences. It functions as an operating system within an operating system, often acting as the primary environment for digital work and communication

  • Render Engines (The "Translator")

  • A "Client" for Data Retrieval

  • Security and Privacy Manager

  • Platform for Web Applications

  • Digital Tools and Customization

  • Privacy Enhancing Tools (VPNs/Tor)

Main parts of a Browser

A web browser consists of six main high-level components: the User Interface (UI), Browser Engine, Rendering Engine, Networking, JavaScript Engine, and Data Storage. Together, these components translate HTML, CSS, and JavaScript code into interactive visual pages, managing user actions, network requests, and data persistence.

  • User Interface (UI): The part users interact with, including the address bar, back/forward buttons, bookmarks menu, and tabs. It does not render the page content but sends instructions to the engine.

  • Browser Engine: Acts as the manager, marshaling actions between the UI and the rendering engine.

  • Rendering Engine: Responsible for displaying requested content by parsing HTML and CSS to render the visual page. Examples include Blink (Chrome), WebKit (Safari), and Gecko (Firefox).

  • Networking: Handles network communication, such as HTTP/HTTPS requests, to fetch resources from the internet.

  • JavaScript Engine: Interprets and executes JavaScript code to make web pages dynamic and interactive.

  • Data Storage (Persistence): A layer for saving data locally on the user's machine, such as cookies, localStorage, and IndexedDB.

  • UI Backend: Used for drawing basic widgets like combo boxes and windows, utilizing the operating system's user interface methods.

User Interface: address bar, tabs, buttons

The browser user interface (UI) consists of navigation controls (address bar, back/forward/refresh buttons) and tab management, designed to accept user input and render web pages. The address bar (or Omnibox) allows URL entry, while tabs manage multiple, concurrent web sessions. Buttons (solid, ghost, icon) trigger actions like navigation, form submission, or menu expansion

Key Components of Browser UI:

  • Address Bar (URL Bar): A central input field for entering website addresses (URLs).

  • Tabs: Graphical elements that allow multiple documents, panels, or webpages to exist within a single window, enabling efficient switching between them.

  • Buttons: Interactive elements for navigation (back, forward, refresh, home) and specific page actions (bookmarks, extensions).

Browser Engine vs Rendering Engine

The browser engine acts as the high-level coordinator of the entire browser application, while the rendering engine is a core component within the browser engine responsible specifically for displaying web content.

Browser Engine

  • Role: The browser engine serves as a bridge, managing communication and actions between the user interface (UI) (address bar, buttons, menus, etc.) and the underlying components, including the rendering engine and networking layer.

  • Function: It interprets user interactions (like clicking a link or typing a URL) and directs the relevant engines to perform necessary tasks, such as fetching data from the network or telling the rendering engine what to display.

Rendering Engine

  • Role: The rendering engine's primary job is to take the raw web page data (HTML, CSS, images, etc.) and transform it into an interactive visual representation on the screen.

  • Function: It parses the HTML to create the Document Object Model (DOM), processes CSS to build the CSS Object Model (CSSOM), combines them into a render tree, calculates the layout of elements, and finally "paints" the pixels on the screen.

Networking: how a browser fetches HTML, CSS, JS

A browser fetches HTML, CSS, and JS through a multi-step networking process involving DNS resolution, TCP handshakes, HTTP requests, and the rendering engine

  1. User Input: The user types a URL (e.g., https://example.com) into the browser's address bar and presses enter.

  2. DNS Resolution: The browser needs the IP address for example.com. It checks its local cache, the operating system cache, and then queries a Domain Name System (DNS) server to translate the domain name into an IP address [1].

  3. TCP Connection: Once the browser has the IP address, it establishes a reliable connection to the server using the Transmission Control Protocol (TCP) via a "three-way handshake" [1]. If the URL is HTTPS, this step is followed by a TLS handshake to encrypt the connection.

  4. HTTP Request for HTML: The browser sends an HTTP GET request to the server, asking for the main HTML file [1].

  5. Server Response: The server processes the request and sends the HTML content back to the browser along with an HTTP status code (e.g., 200 OK) [1].

  6. Parsing and Resource Discovery: The browser's rendering engine begins parsing the HTML. As it parses, it encounters <link> tags for CSS and <script> tags for JavaScript files.

  7. Concurrent Requests: For each required CSS and JS file, the browser repeats the process (steps 2-5, though the TCP connection might be reused or new ones opened) to fetch these resources [1].

    • CSS: CSS files are downloaded and parsed into a CSS Object Model (CSSOM).

    • JS: JS files are downloaded, parsed, and executed. Often, JS execution blocks the rendering of the page until it is complete, unless specified otherwise (e.g., with async or defer attributes) [1].

  8. Rendering: Once all the necessary HTML, CSS, and JS have been fetched and processed, the browser constructs the final webpage structure (the Document Object Model or DOM) and renders the page for the user to see

HTML parsing and DOM creation

HTML parsing is the process by which a web browser reads HTML markup and converts it into an internal representation called the Document Object Model (DOM). The DOM is a tree-like structure of nodes, where each HTML tag, attribute, or piece of text becomes a node in the hierarchy.

CSS parsing and CSSOM creation

CSS parsing is the browser's process of converting raw CSS text into a structured, usable format called the CSS Object Model (CSSOM). The CSSOM is an independent, tree-like data structure that holds all the styling information for a webpage and is a critical step in the browser's rendering process.

Layout (reflow), painting, and display

Layout (reflow), painting, and display are critical steps in the browser rendering pipeline that turn HTML/CSS into pixels. Layout calculates the geometric positions and sizes of elements. Painting fills in pixels, colors, and images. Display (often involving compositing) puts these pixels on the screen. These, especially reflow, are performance-intensive.

Layout (Reflow)

  • Definition: The process of calculating the exact position and size of every visible element on the page.

  • Triggering Events: Changed window size, font changes, adding/removing DOM nodes, or updating styles that affect geometry (width, height, padding, margins).

  • Impact: A reflow on one element can trigger the reflow of all child and ancestor elements, making it an expensive operation.

Painting (Rasterization)

  • Definition: The process of filling in the actual pixels on the screen, including colors, text, images, and shadows.

  • Triggering Events: Changes to CSS properties that do not affect the layout, such as color, background-color, visibility, or outline.

  • Characteristics: Usually occurs after a reflow, but can occur independently if only the appearance (not geometry) changes.

Display & Compositing

  • Definition: The final step where layers created during painting are combined and sent to the screen.

  • GPU Utilization: Modern browsers often use the GPU for compositing, which is more efficient than painting on the CPU.

Performance Optimization

  • Minimize Layouts: Avoid updating individual styles; instead, change class names to batch changes.

  • Avoid Layout Thrashing: Do not alternate reading and writing DOM properties (e.g., getting offsetWidth immediately after setting width), as this forces multiple, slow synchronous reflows.

  • Use Transform/Opacity: Use CSS transform and opacity for animations, as these often bypass layout and painting, allowing for GPU-accelerated compositing.