Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a
command-line interface or using network communication. They are particularly useful for
testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of
JavaScript and
Ajax which are usually not available when using other testing methods.[1]
Since version 59 of
Google Chrome[2][3] and version 56[4] of
Firefox,[5] there is native support for remote control of the browser. This made earlier efforts obsolete, notably
PhantomJS.[6]
Headless browsers are also useful for
web scraping.[7]Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax.[8]
Headless browsers have also been misused in various ways:
However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers.[3] There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks,
SQL injections or
cross-site scripting attacks
Usage
As several major browsers natively support headless mode through
APIs, some software exists to perform browser automation through a unified interface. These include:
QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.
Alternatives
Another approach is to use software that provides browser APIs. For example,
Deno provides browser APIs as part of its design. For
Node.js, jsdom[18] is the most complete provider. While most are able to support common browser features (HTML parsing,
cookies,
XHR, some JavaScript, etc.), they do not
render the
DOM and have limited support for
DOM events. They usually perform faster than full browsers, but are unable to correctly interpret many popular websites.[19][20][21]
Another is
HtmlUnit, a headless browser written in Java. HtmlUnit uses the
Rhino engine to provide JavaScript and Ajax support as well as partial rendering capability.[22][23]
List of headless browsers
These are various software that provide headless browser APIs.
Splash is a headless web browser written in
Python using the
WebKit layout engine via
Qt. It has an HTTP API,
Lua scripting support and a built-in
IPython (Jupyter)-based IDE. Development started at ScrapingHub in 2013; it is partially funded by
DARPA.[24][25]
Zombie.js is a simulated browser environment for
Node.js.[26]
SimpleBrowser is a headless web browser written in C# supporting .NET Standard 2.0[27]
DotNetBrowser is a proprietary .NET Chromium-based library that provides the off-screen rendering mode and can be used without embedding or displaying windows.[28][29]
Another noted earlier effort was envjs in 2008 from
John Resig, which was a simulated browser environment written in JavaScript for the
Rhino engine.[30]