Generate Serialization Classes As Part of Your Build (Part 2)

A while ago, I posted about integrating the xsd.exe tool into the build process within Visual Studio or MSBuild. This time around, we’ll look at generating the code for the XML Serialization object model and making a few tweaks as part of the process. Here I will specifically look at retrieving the comments in the xsd:documentation elements and applying them to code elements as XML comments. We’ll break this down into four key steps along with some closing comments and source code.

Creating the Schema

Before we start, we’ll need a schema to work with. For simplicity, we’ll use the same schema as last time with the addition of xsd:documentation elements to provide descriptions for some elements and attributes. The complex types are also no longer anonymous.

<?xml version="1.0" encoding="utf-8"?>

<xs:schema id="SampleSchema"

          targetNamespace="http://schemas.example.com/SampleSchema.xsd"

          elementFormDefault="qualified"

          xmlns="http://schemas.example.com/SampleSchema.xsd"

          xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="root" type="rootType"/>

  <xs:complexType name="rootType">

    <xs:sequence>

      <xs:element name="first" type="firstType" />

      <xs:element name="second" type="secondType"/>

    </xs:sequence>

  </xs:complexType>

 

  <xs:complexType name="firstType">

    <xs:annotation>

      <xs:documentation>Describes the first element in the document.</xs:documentation>

    </xs:annotation>

    <xs:attribute name="name" type="xs:token">

      <xs:annotation>

        <xs:documentation>The name of the first element.</xs:documentation>

      </xs:annotation>

    </xs:attribute>

    <xs:attribute name="value" type="xs:token">

      <xs:annotation>

        <xs:documentation>The value of the first element.</xs:documentation>

      </xs:annotation>

    </xs:attribute>

  </xs:complexType>

 

  <xs:complexType name="secondType">

    <xs:attribute name="origin" type="xs:token" />

  </xs:complexType>

</xs:schema>

The descriptions here are nonsensical, but this example is small enough to demonstrate the power of what we can do with custom code generation. We still aim to generate classes at build time, but first we have to implement the logic that the build step will use. The next sections describe what we need to do.

Generating the Code

The first step is to import the schema into the code.

XmlSchemas schemaSet = new XmlSchemas();
XmlSchema xsd = null;

using (XmlReader reader = XmlReader.Create(mySourceXsdFile))

{

    xsd = XmlSchema.Read(reader, null);

    schemaSet.Add(xsd);

}

We instantiate a new instance of the System.Xml.Serialization.XmlSchemas (in System.Xml.dll) class and add the schema file to that schema set. In the code above, mySourceXsdFile is a field which is set during the construction of the object which contains this code. The next step is to translate the schema definition into a CodeDom CodeNamespace and then export that CodeNamespace instance to code. Simple, right?

Fortunately for us, there are classes built into the framework to handle this. First we set up a schema importer and a code exporter:

// Load the XSD into the importer

XmlSchemaImporter importer = new XmlSchemaImporter(schemaSet);

 

// Set up the objects needed for code export.

CodeNamespace ns = new CodeNamespace(this.Namespace);

XmlCodeExporter exporter = new XmlCodeExporter(ns);

The argument for the CodeNamespace constructor takes a string representing the desired namespace name for the generated code. The translation between these objects occurs using an System.Xml.Serialization.XmlTypeMapping instance to connect the two. To create the type mapping, we’ll need an System.Xml.XmlQualifiedName representing the root element of the schema. The exporter can then export that type mapping to a CodeDom representation, which we can utilize to write the code output to a file. The result looks something like this:

XmlSchemaObjectCollection xsdObjects = xsd.Items;

int xsdObjectCount = xsdObjects.Count;

for (int i = 0; i < xsdObjectCount; ++i)

{

    XmlSchemaObject xsdObject = xsdObjects[i];

 

    XmlSchemaElement xsdElement = xsdObject as XmlSchemaElement;

    if (xsdElement != null)

    {

        // Import the mapping

        XmlTypeMapping mapping = importer.ImportTypeMapping(

            new XmlQualifiedName(xsdElement.Name, xsd.TargetNamespace)

        );

 

        exporter.ExportTypeMapping(mapping);

        break;

    }

}

 

// Build code generator

CSharpCodeProvider codeProvider = new CSharpCodeProvider();

 

using (StreamWriter sw = new StreamWriter("OutputFile.cs"))

{

    codeProvider.GenerateCodeFromNamespace(ns, sw, new CodeGeneratorOptions());

}

The code above iterates over items in the schema until the root element (XmlSchemaElement) is found, at which the importer will import the type mapping, and the exporter will export it to the CodeNamespace defined earlier. At the end of the for loop, the CodeNamespace object contains all of the types representing the XML schema. At the end, we tell a CSharpCodeProvider to generate code from the types in the namespace to an output file, ready for consumption. The code is very simple; however, we are not utilizing the CodeDom APIs to their full potential. Because the types are accessible before generating the code, we can make any changes we desire before writing the code. This is the crux of this post—being able to customize the generated code.

Making Your Modifications

If you are familiar with the CodeDom APIs, then at this point you can probably start making your own modifications. But, just in case you aren’t, I will quickly walk through how to generate XML documentation comments on the generated code members. Re-examining the loop over the objects in the XML schema, we can reasonably infer based on our specific schema that there is more than one object in the collection. In this case, the other named complex types are part of this collection; consequently, we can modify the code in the loop to consider elements and attributes:

// Cached copy of the types as IEnumerable<T> to use LINQ queries on.

IEnumerable<CodeTypeDeclaration> codeTypes = null;

 

XmlSchemaObjectCollection xsdObjects = xsd.Items;

int xsdObjectCount = xsdObjects.Count;

for (int i = 0; i < xsdObjectCount; ++i)

{

    XmlSchemaObject xsdObject = xsdObjects[i];

 

    XmlSchemaElement xsdElement = xsdObject as XmlSchemaElement;

    if (xsdElement != null)

    {

        // Import the mapping

        XmlTypeMapping mapping = importer.ImportTypeMapping(

            new XmlQualifiedName(xsdElement.Name, xsd.TargetNamespace)

        );

 

        exporter.ExportTypeMapping(mapping);

        codeTypes = ns.Types.Cast<CodeTypeDeclaration>();

        continue;

    }

 

    XmlSchemaComplexType xsdComplexType = xsdObject as XmlSchemaComplexType;

    if (xsdComplexType != null)

    {

        string complexTypeName = xsdComplexType.Name;

 

        // We should throw (with First) if we can’t find a type with this complex type’s name.

        CodeTypeDeclaration codeTypeDeclaration =

            codeTypes.First(ct => ct.Name == complexTypeName);

 

        // Get documentation specifically for the complex type.

        string documentation = GetDocumentation(xsdComplexType);

        SetSummaryComment(codeTypeDeclaration, documentation);

 

        // Now get documentation for the attributes

        XmlSchemaObjectCollection attributes = xsdComplexType.Attributes;

        int attributeCount = attributes.Count;

        for (int j = 0; j < attributeCount; ++j)

        {

            XmlSchemaAttribute xsdAttribute = (XmlSchemaAttribute)attributes[j];

 

            string attributeName = xsdAttribute.Name;

            CodeMemberProperty property = codeTypeDeclaration.Members.OfType<CodeMemberProperty>().First(

                prop => prop.Name == attributeName);

 

            string attributeDocumentation = GetDocumentation(xsdAttribute);

            SetSummaryComment(property, attributeDocumentation);

        }

    }

}

There are a few significant changes here. First, immediately after the type mapping is exported, we cast all of the CodeNamespace’s types to CodeTypeDeclarations for use later in the loop. On subsequent iterations of the loop, we anticipate running into XmlSchemaComplexType objects. The code then retrieves the appropriate CodeTypeDeclaration which matches that complex type, retrieves the documentation for that type, and sets the summary comment. The pattern is repeated for the attributes in each complex type.

The GetDocumentation and SetSummaryComment methods are fairly simple and involve manipulating the object models of XML Schema and CodeDom, respectively, to achieve their goals. Fortunately for the GetDocumentation method, there is a base class in the object model which encapsulates all elements which can be annotated.

private static string GetDocumentation(XmlSchemaAnnotated annotatedElement)

{

    // Look inside the Annotation element

    XmlSchemaAnnotation annotation = annotatedElement.Annotation;

    if (annotation != null)

    {

        XmlSchemaObjectCollection annotationItems = annotation.Items;

        if (annotationItems.Count > 0)

        {

            XmlSchemaDocumentation xsdDocumentation =

                annotationItems[0] as XmlSchemaDocumentation;

            if (xsdDocumentation != null)

            {

                XmlNode[] markup = xsdDocumentation.Markup;

                if (markup.Length > 0)

                {

                    return markup[0].InnerText;

                }

            }

        }

    }

 

    return String.Empty;

}

Above we assume that the xsd:documentation element will be the first one within the xsd:annotation element. This is not a safe assumption for all schemas.

private static void SetSummaryComment(CodeTypeMember codeTypeMember, string comment)

{

    if (comment == null || String.Empty.Equals(comment, StringComparison.Ordinal))

    {

        return;

    }

 

    CodeCommentStatementCollection comments = codeTypeMember.Comments;

 

    if (comments.Count == 0)

    {

        comments.Add(new CodeCommentStatement(String.Empty, true));

    }

 

    comments[0].Comment.Text = String.Format(

        "<summary>{0} {1}{0} </summary>",

        Environment.NewLine,

        comment);

}

Integrating with MSBuild

The final step is to integrate this code generator with MSBuild. If you are familiar with creating custom tasks in MSBuild, then this will be review for you. However, the steps are as follows:

  1. (Optional) Encapsulate the business logic laid out above into a separate, re-usable class.
  2. Create a class inheriting from Microsoft.Build.Utilities.Task (in Microsoft.Build.Utilities.dll or Microsoft.Build.Utilities.v3.5.dll).
  3. Create properties for each parameter that the task uses to accomplish its goal. (You may consider items like XsdFileLocation or DestinationFile.)
  4. Override the Execute method to carry out your custom logic, utilizing the properties as necessary.

A sample implementation could look like this:

public class XsdGen : Task

{

    public string DestinationFile

    {

        get;

        set;

    }

 

    [Required]

    public string Namespace

    {

        get;

        set;

    }

 

    [Required]

    public string XsdLocation

    {

        get;

        set;

    }

 

    public override bool Execute()

    {

        try

        {

            XsdCodeGenerator generator = new XsdCodeGenerator(XsdLocation, Namespace);

            generator.DestinationFile = DestinationFile;

            generator.Generate();

            return true;

        }

        catch (Exception ex)

        {

            Log.LogErrorFromException(ex, false);

            return false;

        }

    }

}

Once the task is created, we can go back to editing the project file as we did in the first post in the series. As a reminder, here is what we had before:

<Target Name="GenerateSerializationClasses" Inputs="SampleSchema.xsd" Outputs="SampleSchema.cs">

  <Exec Command="&quot;C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin\xsd.exe&quot; SampleSchema.xsd /classes" />

</Target>

<Target Name="BeforeBuild" DependsOnTargets="GenerateSerializationClasses">

</Target>

And here is what we have now:

<UsingTask AssemblyFile="DeWinter.XsdGen.dll" TaskName="XsdGen" />
<Target Name="GenerateSerializationClasses" Inputs="SampleSchema.xsd" Outputs="SampleSchema.g.cs">

  <XsdGen DestinationFile="SampleSchema.g.cs" Namespace="DeWinter.Samples.Serialization" XsdLocation="SampleSchema.xsd" />

</Target>

<Target Name="BeforeBuild" DependsOnTargets="GenerateSerializationClasses">

</Target>

Before the actual compilation step, MSBuild will execute the GenerateSerializationClasses target, locate the XsdGen task in DeWinter.XsdGen.dll, and then call the Execute method on that task. This will generate the XML Schema object model as desired (notice the XML comments below):

XML Schema Object Model

Let the Buyer Beware

The ironic part about writing this post is that the more I wrote, the less appealing this method was. Think about it; you are essentially writing a replacement for xsd.exe. The more generalized you want to make it the more difficult it becomes. In this post I accounted for named complex types only. What about anonymous complex types? Complex types that are extensions or restrictions of other complex types? Other facets like enumerations? XML Schema has a multitude of different elements, and building a suitable replacement for xsd.exe that suits your needs just isn’t worth it most of the time.

The other problem I have with it is just that there is so little gain to justify the amount of work you would put into it. My original justification for investigating this method was that it made the object model much nicer to work with, because it was well-commented and not as awkward to use as the normally generated OM (object model). But that’s really subjective!

My personal opinion, though, is that the object model generated by xsd.exe is fine. If you really want an object model that is "better" to work with (whatever "better" means to you), then build an abstraction layer on top of the generated OM and have a service class translate between the two. In my next post on serialization I’ll cover this.

Reference Source Code

This was my reference for this post, and there quite a few more features included, like stripping suffixes from type names and "correcting" the casing of the generated classes. (Hmm, that is becoming a theme among my work now, isn’t it…?). But it is not complete and probably works only against very limited situations. I would advise you to use this for reference only. And of course, the disclaimer…

Disclaimer: This software is provided as is, and I am not responsible or liable for any damages arising in any way out of the use of this software, even if advised of the possibility of such damage.

Download

3 Responses to “Generate Serialization Classes As Part of Your Build (Part 2)”

  1. Bob Communion Says:

    Hello,

    I have been trying to cater for multiple xsd files in a project by modifying the build rules you put in part 1. I have come up with the following sample build file. The only problem I am having is that the classes are regenerated every time msbuild is run, which means I have to have the files checked out every time I do a build. Do you know how to get this to work incrementally?

    <?xml version="1.0" encoding="utf-8"?>
    <Project ToolsVersion="3.5" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <ItemGroup>
    <GenerateSerializationClass Include="DataSources\ChequeData.xsd" />
    <GenerateSerializationClass Include="DataSources\LoadDataResponse.xsd" />
    </ItemGroup>
    <Target
    Name="GenerateSerializationClasses"
    Inputs="@(GenerateSerializationClass)"
    Outputs="@(GenerateSerializationClass->'%(Directory)%(Filename).cs')">
    <Exec
    WorkingDirectory="DataSources"
    Command="&quot;C:\Program Files\Microsoft SDKs\Windows\v6.0a\Bin\xsd.exe&quot; %(GenerateSerializationClass.Filename)%(GenerateSerializationClass.Extension) /classes /n:MyCompanyName.Logic.DataSources"
    />
    </Target>
    <Target Name="Build" DependsOnTargets="GenerateSerializationClasses">
    </Target>
    </Project>

  2. David DeWinter Says:

    Hi Bob,

    In your GenerateSerializationClasses target’s Outputs, try using %(RelativeDir) instead of %(Directory). %(Directory) will give you the full directory path (without the drive letter) to your file, which MSBuild will treat as relative.

    An easy way to debug these sorts of problems is to increase the verbosity of your MSBuild output. To do that within VS, go to Tools > Options… > Projects and Solutions > Build and Run, and on the options page you should see a drop down for MSBuild project build output verbosity. Usually “Detailed” is enough to see something like this:

    Output file “Users\ddewinter\Documents\Projects\DBML Fixup\Implementation\DeWinter.DbmlFixup\DeWinter.DbmlFixup.Integration\Conceptual\DbmlFixupPreferences.cs” does not exist.
    Output file “Users\ddewinter\Documents\Projects\DBML Fixup\Implementation\DeWinter.DbmlFixup\DeWinter.DbmlFixup.Integration\Conceptual\DbmlSchema.cs” does not exist.

    Hope this helps!

    David

  3. Bob Communion Says:

    Thanks David,

    I changed my outputs to

    Outputs="@(GenerateSerializationClass->'%(RelativeDir)%(Filename).cs')"

    And it does build incrementally. Thanks for your help.

Leave a Reply