Writing Your Own UriParser

I posted before about my displeasure using System.Uri for advanced scenarios. It won’t parse mailto-style URIs and extending it is just evil. Yet, I’d still recommend you use System.Uri for your URI needs, even if it won’t parse your URI.

 

Why? Well, with each version of the framework, Microsoft will work to make it better (at least that’s what they promise us). Your classes should version well across framework versions if you use their guidelines. Therefore, if you’re using FxCop to check your code against the guidelines, I’d recommend fixing violations of the “Use System.Uri instead of string” rule.

 

It just so happens that they kinda worked around my two earlier complaints: they deprecated those heinous protected methods; currently you only get a compiler warning, but it’s a start. They also provided a mechanism to parse URI schemes that the framework doesn’t know about: extend UriParser.

 

Before I start with the nitty-gritty details of building your own UriParser, I’d like to point out something amusing (well, amusing to total nerds who’d rather write about parsing URIs than play video games or watch Arrested Development). Go to the documentation on MSDN2 for UriParser, you’ll read this:

 

The UriParser class enables you to create parsers for new URI schemes. You can write these parsers in their entirety, or the parsers can be derived from well-known schemes (HTTP, FTP, and other schemes based on network protocols). If you want to create a completely new parser, inherit from GenericUriParser. If you want to create a parser that extends a well-known URI scheme, inherit from FtpStyleUriParser, HttpStyleUriParser, FileStyleUriParser, GopherStyleUriParser, or LdapStyleUriParser.

Notice something missing? That’s right, there is no fucking MailtoStyleUriParser!

 

Although I’m glad gopher is finally getting its due.

 

When the product feedback center came out, I figured I’d give it a shot so I entered a new feature request to get Uri to parse sip, sips and tel URIs. Read the comment at the bottom by Mr. Daniel Roth; you’ll see that writing your own mailto-style URI parser is easy because mailto is a “so rudimentary” scheme. That kinda begs the question: if it’s so rudimentary, then WHY ISN’T THERE ONE IN THE FRAMEWORK?

 

To add further to the fire, here’s the second paragraph from the UriParser class description:

 

Microsoft strongly recommends that you use a parser shipped with the .NET Framework. Building your own parser increases the complexity of your application, and will not perform as well as the shipped parsers.

Hmph.

I’m not going to go into detail about inheriting from GenericUriParser in this post. Instead, let’s suppose I’ve already done that. I think I should show you how to let the framework know about your parser so that it will load it and use it. I feel obligated to tell you about it because the documentation for these methods is really fucking terrible. Suppose I have two types called PresUriParser and SipUriParser both of which inherit from GenericUriParser. To tell the framework about these parsers, I call the UriParser.Register(uriParser, schemeName, defaultPort) method:

 

      UriParser.Register(new PresUriParser(), “pres”, 1);

       UriParser.Register(new SipUriParser(), “sip”, 5060);

 

That browser of yours isn’t playing tricks: the first call has a port of -1; that means no default port. The terrible, terrible documentation doesn’t show that. Nor does it show that every argument passed is validated and an exception is thrown if validation fails. The exception table should be the following for argument validation:

 

Type of Exception

Reason

ArgumentNullException

if uriParser, schemeName is null

ArgumentOutOfRangeException

if schemeName has length 1 or violates the rules for scheme names through myriad different ways.

ArgumentOutOfRangeException

if defaultPort >= 65535 or (defaultPort < 0 and defaultPort != -1)

 

But wait, there’s more! Suppose you get passed the argument validation, you’re not yet through the exception danger zone. Suppose you had code like this:

 

      PresUriParser p = new PresUriParser();

       UriParser.Register(p, “pres”, 1);

       UriParser.Register(p, “presence”, 5060);

 

This will throw an InvalidOperationException; apparently, you can only have one instance parse one scheme, even if your parser can parse multiple schemes. This makes sense, especially in multi-threaded scenarios, but does the documentation say this? Hell, no!

 

So you get the argument validation and now a potentially invalid operation. All undocumented. But I have more for you: a free set of knives!

 

Suppose you wanted to parse http on your own, with code like this:

 

  public class HttpUriParser : HttpStyleUriParser

  {

    public HttpUriParser() { }

  }

 

And then I register my parser, like so:

 

  UriParser.Register(new HttpUriParser(), “http”, 80);

 

Guess what? InvalidOperationException! That scheme is already registered.

Makes me wonder why there are all those parsers for known schemes if you can’t use them on the schemes that they parse.

 

There was a way to do it through the config file, but that content has been retired according to MSDN2. There appear to be some issues with Uri and security, outlined here in the breaking changes list. Perhaps Microsoft didn’t want others screwing up the perfectly good parsing for the known schemes, like they can by overriding the methods I mentioned last time.

 

Whatever the case, hopefully this quick article helps with the documentation for System.UriParser. Next time, we’ll get our hands dirty overriding methods and abusing Console.WriteLine().