Sunday, May 15, 2011

Wicket 1.4 URL Character Encoding Problems

While developing web apps using wicket I've faced a problem with character encoding on URLs, for example, if you are using default settings and deploying your application into Tomcat, you'll be able to reproduce the issue quite simply:
  • Create a page that prints somewhere a parameter read from the url (on standard query string url coding strategy).
  • Call the page directly from the browser using some special char on the parameter value.
  • Check that the special char has been decoded wrongly and you're scratching your head ;) on how to solve this out.
So my first thought was "this for sure is a wicket issue!" but soon learned the problem is much deeper than that. It turns out that most browsers don't tell to the server the encoding in which they're making the request so it's up to the server to guess this.

Modern browsers can send (and they actually do) make the requests in UTF-8 but most of the containers assume (because it's what you can read in the HTTP protocol RFC) that the encoding is by default ISO-8859-1. From the HTTP protocol definition:
The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
Ok, now, wicket has a lot of things you can set up on the web application, one of all is the request cycle character encoding, and it defaults to UTF-8, so this leads us to the first solution to our problem.

The first solution: Change the request cycle character encoding.

This can be done in a very simple way, with just a line of code:
 getRequestCycleSettings().setResponseRequestEncoding("ISO-8859-1");
By doing this, you tell wicket to treat requests and responses as they were in ISO-8859-1 and special characters on the URL get url-encoded and everything works just fine. I have though some suspicions that some strings with special characters get broken but I really can't confirm that (because it could be they get broken for some other reason).

Now, I don't know you but I really don't feel comfortable with having my request and response on ISO-8859-1, specially because I like UTF-8. So since the problem is that the container assumes the requests are in ISO-8859-1 why not tell the container to assume the requests will be by default on UTF-8?

The second solution: Let the container assume the requests are in UTF-8.

To make this solution work we'll need to have access to the production server's configuration, and it's quite easy: tomcat's http connector has a property called URIEncoding, we just need to set that up and all set:
 <Connector port="8080" protocol="HTTP/1.1"  
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8" />
This solution is quite pretty and works like a charm as well, and you can stick with your beloved UTF-8 charset.

I have a third solution but isn't pretty.

The third solution: Write your own QueryStringURLCodingStrategy

This solution is what I've fell down when I first met the issue because I thought it was a wicket issue and was exactly that: subclassing QueryStringURLCodingStrategy and write my own implementation of the decode parameters method and decode the parameters yourself!. At the time it worked, but now I can see it just have a lot of side problems so I think you want to stay away from this solution.

Other solutions that didn't work for me.

While trying to solve this problem I also tried other solutions without luck: some tried writing a servlet filter to set the encoding of the request to UTF-8 including the filters provided by the spring framework and apache tomcat. I also wrote my own implementation of a filter to change the encoding with no luck (the problem didn't go away). But also I think it's worth a try.

No comments:

Post a Comment