Creating an alternative browser engine for iOS

Apple will soon be allowing alternative browser engines on iOS, and I think that is a great opporuntity to learn about web browsers. With that in mind I have three goals for this blog post:

Browsers are complicated programs that run untrusted code in the form of websites. These websites could be malicious, but users expect that an evil website can't hack their computer or steal their bank account password. The naive solution to this problem is to just not give the JavaScript running in the page an api to do anything dangerous. But historically websites have exploited bugs in browsers to run native code, not just the JavaScript apis that the browser provides. If you are interested in the specifis of these attacks, I recommend LiveOverflow's Browser Exploitation playlist.

Even if your browser is exploit free, you can expect clever JavaScript to use Spectre to read any memory within its own process. So the modern approach to browser design is to concede that an evil website probably can run native code somehow, and isolate it well enough from the rest of the browser to where that isn't a problem. We do that by separating the browser into multiple operating system processes which communicate only via a limited IPC interface. If our interface is secure, our browser can remain secure even if a child process has been compromised.

If you are interested in how real browsers do this, Chromium uses *.mojom files and WebKit uses their * files to define their IPC interface. Looking at the IPC interfaces of real browsers is also a great way to get your bearings wtihin their large codebases.

Apple mandates that our browser engine be of this multi-process design. Previously it was not possible to split your iOS app into multiple processes. Apple has added new APIs in BrowserEngineKit that let you split your app into four processes which they call extensions.

Apple lets you launch one Networking and one Rendering process, and as many WebContent processes as you want. You get an XPC interface to talk to each of these processes. The catch is that you need to request an entitlement from Apple to use these APIs, and you have to live in the EU. I thought maybe I would be able to use them for development, but I've had no such luck. However I was able to hack together enough private APIs to get a multiprocess architecture working via XPC, so I will proceed as if I had access to the real APIs. Sure I wouldn't be able to submit this to the App Store, but I can't do that anyway because I live in the USA.

In the Main process, I'll start by creating an address bar for the user to type in the URL of the web page.

A blank app with an address bar.

Once the user has entered URL, we need to fetch it. This happens in the Networking process. Here is the XPC interface of my networking process:

@protocol NetworkingProtocol <NSObject> -(void)fetchURL:(NSString*) url complete:(void(^)(NSData*)) complete; @end

The job of our Networking component is to download URLs and give us back the data. Apple has a nice API for us to use here in the form of NSURLSession.

- (void)fetchURL:(NSString *)url complete:(void(^)(NSData*)) complete { NSURLRequest* request = [NSURLRequest requestWithURL:[NSURL URLWithString:url] cachePolicy:NSURLRequestReloadIgnoringCacheData timeoutInterval:60.0]; NSURLSession* session = [NSURLSession sessionWithConfiguration:[NSURLSessionConfiguration defaultSessionConfiguration]]; NSURLSessionDataTask* task = [session dataTaskWithRequest:request completionHandler:^(NSData * data, NSURLResponse * response, NSError * error) { complete(data); }]; [task resume]; }

And logging the output back in the main process gives us:

<!DOCTYPE html> <html> <head> <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/rss.xml" /> </head> <body> <h1>Blog</h1> <ul> <li><a href="">Creating an alternative browser engine for iOS</a></li> <li><a href="">How long is 100 milliseconds?</a></li> <li><a href="">AI Found a Bug in My Code</a></li> </ul> <h1>Some Tools</h1> <ul> <li><a href="useragent.php">useragent</a></li> <li><a href="ip.php">ip</a></li> <li><a href="ascii">ascii</a></li> <li><a href="slow.php">slow</a></li> </ul> Twitter: <a href="">@JoelEinbinder</a> <br /> Mastodon: <a rel="me" href=""></a> <br /> Threads: <a href="">@joeleinbinder</a> </body> </html>

Next up is to turn that html text into DOM elements. This happens in the WebContent process. Here is the XPC interface of my WebContent process:

@protocol WebContentProtocol <NSObject> -(void)setSource:(NSString*) source; -(void)resize:(CGSize)size; -(void)tap:(CGPoint)point; @end @protocol WebContentProtocolHost <NSObject> -(void)requestRepaint:(CommandList*) list; -(void)requestNavigation:(NSString*) url; -(void)showMessageBox:(NSString*) text; -(void)requestData:(NSString*)url completion:(void (^)(NSData* data))completion; @end

Luckily my website is valid XML, so I am just going to use Apple's XML parser instead of writing my own HTML parser. But for a real browser, you are going to want to incrementally parse html here. Running [documentElement textContent] gives:

Blog Creating an alternative browser engine for iOS How long is 100 milliseconds? AI Found a Bug in My Code Some Tools useragent ip ascii slow Twitter: @JoelEinbinder Mastodon: Threads: @joeleinbinder

The next step is to render these DOM elements on the GPU. But we are in the WebContent process, which should not have access to the GPU. So we will create a list of all the graphics commands that we want to run and send them to the *Rendering* process. I call this a command list, but WebKit calls it a display list and Chromium calls it a command buffer.

@interface CommandList : NSObject <NSSecureCoding> -(void)render; -(float)width; -(float)height; -(instancetype)initWithSize:(CGSize) size; -(void)drawText:(NSString*)text atPoint:(CGPoint) point withAttributes: (NSDictionary<NSAttributedStringKey, id>*) attributes; -(void)drawButton:(NSString*)text inRect:(CGRect) rect; -(void)drawImage:(UIImage*)image inRect:(CGRect) rect; -(void)addChildViews: (UIView*) toView; @end

The DOM elements in the WebContent process call drawText and drawImage to fill up the command list. Then the list is serialized over XPC and sent to the Rendering process, where we will want to call render.

Drawing 2D graphics efficiently is a really complicated problem, but luckily there are a bunch of libraries out there which do a great job. I'll use Quartz, because it's built iOS. But you could use Google's Skia which powers both Chromium and Firefox. WebKit of course uses Quartz on Apple Platforms, but they use Cairo on Linux and Windows.

If you are a web developer, the Quartz api probably looks familiar. The <canvas> element was created by apple as a thin wrapper around Quartz before it was standardized into the modern web.

Here is the XPC interface of my Rendering process:

@protocol RenderingProtocol <NSObject> -(UIView*)renderCommandList:(CommandList*)list; @end

All we have to do next is call [list render] from within the drawRect: method of a UIView. Except I don't know how to display UIView objects outside of the main process, and I expect that even if I did it would be gated behind the same entitlements that I don't have access to.

Apple documentation tells you to use the method createVisibilityPropagationInteraction to ensure that your child processes are allowed to draw to the screen. But this is an otherwise undocumented private method. It is in WebKit though, and surrounded by promising looking graphics code. If someone knows how to draw graphics out-of-process then please let me know.

For now I will implement Rendering inside of the main process, and just communicate with it over an XPC-like api as if it were out of process.

The minimal website is rendered.

It is extremely satisfying to see a real web page displayed by all of my own layout and rendering code. Even if it is one as spartan as my home page. But I said that I wanted to render this web page, not my home page. Which means there is a lot of work still ahead, starting with being able to tap on that link to navigate to this blog post.

In a modern browser, handling user input is a multi-process affair. I start by detecting the tap in the Main process. Then I send the tap location to the WebContent process via the tap: method. The WebContent needs to figure out which DOM element was tapped, and then request a navigation from the Main process because it was a link. The Main process then sends a request to the Networking process to fetch the new URL. After the data comes back to the main process, it is sent back to the WebContent process to be parsed and rendered.

The web page has no word wrapping and no styling.

Well that doesn't look quite right. This web page requires word wrapping and CSS and images, and I don't have any of that yet.

Word wrapping and CSS are difficult but not that interesting so I will skip explaining them and just implement them.

The web page is fully styled.

Images require the WebContent to be able to request the image data from the Networking process via requestData:completion:. Then once the image data fully loads, the WebContent re-renders itself by generating a new command list. A real browser would only re-render what has changed, but I just re-render the whole web page. It's ok, iPhones are quite fast. I also had to implement drawImage:inRect: in the Rendering process, which I did by calling drawInRect: on a UIImage.

The web page renders images.

While we are messing with the command list, I also want to be able to render this button as a native UIButton.

I added a drawButton:inRect: method to the Rendering process, and then I added a drawButton call to the WebContent process. The Rendering process creates a UIButton and adds it to the view.

The button is rendered to the screen.

Now would be the time to implement JavaScript inside the WebContent process. Luckily this page doesn't contain any JavaScript, so I won't. But you could use Apple's JavaScriptCore library, which is the same JavaScript engine that Safari uses. Chromium's V8 engine also has had limited support for iOS for awhile. I don't know if Mozilla's SpiderMonkey engine has iOS support yet, but I imagine it wouldn't be that hard to add given they definitely already support Apple Sillicon Macs.

The main thing that makes JavaScript and WebAssembly fast is also what makes it insecure: JIT. JIT is when the JavaScript engine converts JavaScript code into native assembly and then directly runs that assembly code. A bug in the JIT system could allow an attacker to run arbitrary native code. For this reason, Apple doesn't allow iOS apps to write to executable memory, preventing JIT. But in BrowserEngineKit, we get new APIs to enable JIT in the WebContent process only. This should let you run a full speed JavaScript and WebAssembly engine!

Why I did put so much effort into creating my own alternative browser when Safari is just fine? Why did I program in CSS and images and a multi-process architecture when I could have just used WebKit? But you see, this website doesn't work properly in Safari. When I said that this page doesn't have any JavaScript, I meant that it doesn't have any JavaScript.

In JoelBrowser if you click that button...

A message box says 'VBScript is back!'

If you want, you can check out the code on GitHub or subscribe to my email list.