Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts

w3.org DTD/xhtml1-strict.dtd blocks Windows IE users?

Posted by jriordon on February 20th, 2009

Updated on: February 23 2009
Updated on: February 25 2009

On a few sites I maintain we have several man pages setup using XML and XSL. This week started getting complaints from Windows IE users saying they can't see the man pages any more. The error message is:

CODE:
  1. The XML page cannot be displayed
  2. Cannot view XML input using style sheet.
  3. Please correct the error and then click
  4. the Refresh button, or try again later.
  5.  
  6. -----------------------------------------
  7.  
  8. The server did not understand the request,
  9. or the request was invalid. Error processing resource
  10. 'http://www.w3.org/TR/xhtm...

The header of my page has this in it:

CODE:
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  3. <!-- saved from url=(0013)about:internet -->

When I try to access either http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd or http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd in any browser except Windows IE, the page loads or downloads as expected. In Windows IE the only thing that is served up is "No".

I am curious of others are seeing this. Is it a Microsoft problem? Is it a W3.org problem? As it is Windows IE users appear to be out of luck. Perhaps w3.org simply has had enough of Windows IE and wants them to go away?

I would love to hear other people results on trying to load these URL's and their comments.


Updated on: February 23 2009
Further investigation into this problem, shows that the User-Agent string is the key to IE being blocked from access the DTD's on w3.org.

CODE:
  1. curl http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd -D ./dump.txt -A "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
  2.  
  3. Results:
  4. --------------
  5. No
  6. --------------
  7.  
  8. dump.txt
  9. --------------
  10. HTTP/1.1 503 Go away
  11. Date: Mon, 23 Feb 2009 13:48:30 GMT
  12. Server: Apache/2
  13. Content-Location: msie7.asis
  14. Vary: negotiate,User-Agent
  15. TCN: choice
  16. Retry-After: 86400
  17. Cache-Control: max-age=21600
  18. Expires: Mon, 23 Feb 2009 19:48:30 GMT
  19. P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
  20. Content-Length: 2
  21. Connection: close
  22. Content-Type: text/plain
  23. --------------
  24.  
  25. curl http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd -D ./dump.txt -A "Mozilla/4.0 (compatible; Windows NT 5.1; .NET CLR 1.1.4322)"
  26.  
  27. Results:
  28. --------------
  29. .
  30. .
  31. .
  32. The entire DTD file successfully lists
  33. .
  34. .
  35. .
  36. --------------
  37.  
  38. dump.txt
  39. --------------
  40. HTTP/1.1 200 OK
  41. Date: Mon, 23 Feb 2009 13:50:22 GMT
  42. Server: Apache/2
  43. Content-Location: xhtml1-transitional.dtd.raw
  44. Vary: negotiate,accept-encoding
  45. TCN: choice
  46. Last-Modified: Thu, 01 Aug 2002 18:37:56 GMT
  47. ETag: "7d6f-3a72ac59d0900;45a3e4327da00"
  48. Accept-Ranges: bytes
  49. Content-Length: 32111
  50. Cache-Control: max-age=7776000
  51. Expires: Sun, 24 May 2009 13:50:22 GMT
  52. P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
  53. Connection: close
  54. Content-Type: application/xml-dtd; charset=utf-8
  55. --------------

Further research shows that the offending User-Agent string would appear to be MSIE. Removal of MSIE or any change to MSIE results in a successful return of the DTD.

I tried contacting w3.org last week when I first posted this, but obviously I have the wrong contact info as no one has responded yet.


Updated on: February 25 2009
I heard back from w3.org. They responded with:

This is a known issue related to W3C's excessive traffic [1]. We are
working with Microsoft, and a fix is expected in coming months.

[1] http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

It would appear that Windows IE is attempting to load the DTD on each page load, which is improper behaviour. Perhaps the only solution at this point is to host a copy of the DTD on our own server so that Windows users can still read the XML pages.

Thoughts and suggestions are always welcome.



Reader Comments

Maybe IE simply need to behave how a (good?) browser is suppose to.
There is RFC, Specs, rules for that. They are made not to be annoying but in order for the whole thing to work.
If I’m not mistaken there are some HTTP directives (since the beginning) specifying how often you’re suppose to check for a resource (something like cache, expire and so on…)
That would be good if one day MS engineers stop playing only for themselves, get their finger out of their *ss, and act more playfully with other people on the world ;)

Hi,

i have the same problem… if you choose all works fine…
but the 1.0 transitional and strict cause problems:
The server did not understand the request, or the request was invalid. Fehler beim Bearbeiten der Ressource ‘http://www.w3….

:-(

Hi,

i have the same problem… if you choose

“http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd”

all works fine…
but the 1.0 transitional and strict cause problems:

The server did not understand the request, or the request was invalid. Fehler beim Bearbeiten der Ressource ‘http://www.w3….

:-(

Hosting the DTD files yourself and changing the DOCTYPE to match produces a validation warning (not error) with the W3C validator, but seems to work in all the major browsers (so far I’ve tried it in IE6, IE7, FF3, Safari 3).

You will need to host 4 files: the .dtd file itself, and the 3 .ent files it includes that define entities. This totals about 64 kilobytes that will be loaded on every page request by Internet Explorer users for anything you serve up as application/xml.

I agree that it’s IE that should be fixed, but even if it were patched it would take quite a while for all the IE6 users to update.

I just figured out an alternative to hosting the DTD yourself.

If you are using client-side XSLT with XHTML, which would be the main use-case in which IE has this problem, IE does not require the doctype to be present in the source document. It’s just XML data at that point, before it’s transformed.

So, I created a server-side middleware component that detects the combination of client-side XSLT use and IE user-agent strings, and strips the doctype declaration. So far, it’s working.

Normally, I like to avoid browser detection. However, this problem is created by browser detection in the first place, so that seems to also be the solution. Even if the IE user is using an alternate user-agent string for some reason, this should work because they should then be able to download the DTD from the W3C.

Unfortunately, the above solution (strip doctype) breaks if you have named entities in your source document, because the XML parser cannot read them (except amp, quot, apos, lt and gt, of course) without a doctype.

Since I already wrote a routine to use libtidy to clean up XHTML content, it was a simple matter to use its “numeric-entities” option to convert named HTML entities to numeric entities, which don’t require the doctype.

If you don’t use libtidy, though, you’ll need a different solution.

Ah, the tangled web…

Hi Jason,

Thanks for the posts on this. We eventually went with your first suggestion and hosted the files ourselves. This has so far worked quite well and our man pages are once again running nicely.

I’ve tried changing doctype to html4 loose and stripping the doctype declaration as well, but in the first case the same error ocurrs and in the second css layout goes crazy. I tried to copy xhtml11.dtd from W3C to my site but am still getting an error. Where am I supposed to find the 3 .ent files mentioned by Jason?

Fabio, if you look in the .dtd file you should be able to find the location of the 3 .ent files.