June 9th, 2010

Removing duplicate namespaces in XML Literals (Shyam Namboodiripad)

A common problem that one often runs into with XML literals and the LINQ to XML API is duplicate XML namespaces. Consider the following example. The code imports a default XML namespace – “hello”.

Code:

Imports
<xmlns=hello>

Module Module1
    Sub Main()
        Dim x = <A>
                    <%= <B></B> %>
                </A>
        Console.WriteLine(“x:”)
        Console.WriteLine(x)

        Dim y = <A></A>
        y.Add(<B></B>)
        Console.WriteLine(“y:”)
        Console.WriteLine(y)
    End Sub
End Module

Output:

x:
<A xmlns=”hello”>
  <B></B>
</A>

y:
<A xmlns=”hello”>
  <B xmlns=
hello
></B>
</A>

As you can see, in the output for variable y, the element <B> has a spurious (duplicate) XML namespace declaration. Although this XML is technically ‘correct’ (i.e. legal XML according to the XML 1.0 specification), several tools have problems consuming XML that contains duplicate namespaces.

For example, consider the below code where I load an MSBuild project file (.vbproj) and try to modify its contents. MSBuild projects are essentially just XML files, so I can use VB XML literals to work with such files. I have set the default XML namespace for my code to match the namespace of the XML in the MSBuild file. Notice that the output has the same problem as before – the <NoWarn> node that I added has a duplicate XML namespace.

If I were to save the XML produced by this code as a “.vbproj” file and build it using MSBuild, MSBuild would fail to process the file because of the duplicate namesapace on the <NoWarn> element. In other words, even though the XML is legal, MSBuild can’t consume XML that contains duplicate namespaces.

Code:

Imports
<xmlns=http://schemas.microsoft.com/developer/msbuild/2003>

Module Module2
    Sub Main()
        Dim projectFile As String = “….ConsoleApplication1.vbproj”
        Dim y = XDocument.Load(projectFile)
       
Dim element =  <NoWarn>42016</NoWarn> 
        y.<Project>.<PropertyGroup>.First.Add(element)
        Console.WriteLine(“y:”)
        Console.WriteLine(y)
    End Sub
End Module

Output:

y:
<Project ToolsVersion=”4.0″ DefaultTargets=”Build” xmlns=”
http://schemas.microsoft.com/developer/msbuild/2003>
  <PropertyGroup>

    <NoWarn xmlns=”
420164201642016http://schemas.microsoft.com/developer/msbuild/2003″>42016</NoWarn>
  </PropertyGroup>

</Project>

Why does VB allow XML with duplicate namespaces to be generated?

Consider the first code example above. As you can see from the output for variable x, the VB compiler correctly figures out that a namespace declaration need not be emitted on node <B>. The compiler can figure this out because node <B> is ‘syntactically’ embedded (using an
embedded expression) inside node <A> which is already in the same namespace (i.e. “hello”).

For variable y however, it is very hard for the VB compiler to know that the node <B> is actually going to be embedded inside node <A>. To know this, the compiler would have to inspect the code flow and try to figure out that the node <B> is being passed to an function named ‘Add’ that is defined on type ‘XElement’ and that the target object for function (i.e. y) actually holds a node <A> that is already in the same namespace (i.e. “hello”). Even if the compiler were smart enough to figure this out for this case, it would be almost impossible to make it smart enough to figure this out for cases (like the second code example above) where the source XML is not part of the program (but comes from some file / network packet).

Because the compiler doesn’t know what document the node <B> is going to end up inside, it ‘fully qualifies’ it with the default namespace of the code file (i.e. “hello”).

Ok so the VB compiler can’t figure this out. Surely the LINQ to XML API can, can’t it?

Yes, the LINQ to XML API can figure out and remove duplicate namespaces. But it does not enforce the removal of duplicate namespaces by default. I think the reason it doesn’t is performance (i.e. there is a performance hit involved in checking each node to see whether the node has any duplicate namespaces). After all, the XML is legal (albeit a bit ugly) even when it has duplicate namespaces – so why force an extra namespace check always?

In VS 2010 / .NET 4.0, the LINQ to XML API provides ways to work-around this problem and generate better looking XML. You can add an ‘annotation’ to the root XElement / XDocument node that will tell the API not to emit duplicate namespaces as demonstrated in the below example. The API will then check each node as it is added and remove any unnecessary duplicate namespace declarations from the node.

Alternately, you can use ‘SaveOptions’ / ‘ReaderOptions’ as demonstrated in the examples below. In this case, the XML will be generated with duplicate namespaces, but the API will do extra work to remove the duplicate namespaces at the time of saving the XML.

Code:

Imports
<xmlns=hello> 

Module Module1
    Sub Main()
        Dim y = <A></A>
        y.AddAnnotation(SaveOptions.OmitDuplicateNamespaces)
        y.Add(<B></B>)
        y.Add(<C></C>)
        Console.WriteLine(“y:”)
        Console.WriteLine(y)
    End Sub
End Module

‘If you wish to save the XML to a file
Module Module2
    Sub Main()
        Dim y = <A></A>
        y.Add(<B></B>)
        y.Add(<C></C>)
       
y.Save(“out.xml”, SaveOptions.OmitDuplicateNamespaces)
    End Sub
End Module

‘If you wish to create an XmlReader object for your XML
Module Module1
    Sub Main()
        Dim y = <A></A>
        y.Add(<B></B>)
        y.Add(<C></C>)
       
Dim reader =
        y.CreateReader(ReaderOptions.OmitDuplicateNamespaces)

    End Sub
End Module

Output

y:
<A xmlns=”hello”>
  <B></B>
  <C></C>
</A>

Hope this helps clean up the XML for your apps! 🙂


Some references from MSDN
:

XDocument.Save Method
SaveOptions Enumeration
Imports Statement (XML Namespace)

Ways to remove duplicate namespaces before VB 2010:

Bill McCarthy has a couple of blog posts about how you can clean up namespaces if you are using VS 2008 / .NET 3.5 –
http://msmvps.com/blogs/bill/archive/2007/12/09/more-on-xml-namespaces-in-vb.aspx
http://msmvps.com/blogs/bill/archive/2007/11/24/cleaning-up-your-xml-literal-namespaces.aspx

Author

0 comments