Generate Serialization Classes As Part of Your Build (Part 1)
Yes, I am back from the grave. The past two months have been infuriatingly busy for me (and you’ll see why in a later post), but I finally have some time to write again. The topic? Generating serialization classes as part of your build process (Part 1) and in your own way (Part 2).
I might reach only a niche of .NET developers with this post, but it’s something that niche should be aware of. There is a tool in the .NET framework called xsd.exe, and one of its functions is to generate code, specifically .NET classes, from an XML Schema file (XSD). The tool decorates these classes with XML serialization attributes such that when .NET serializes an instance of the generated class that represents the root element, the output conforms to that XML schema. Often, developers using XML schemas will need to change them throughout the course of a project. Leveraging xsd.exe in a project thus requires the generated classes to be updated when an XML schema is updated. This is a bit dangerous in larger projects because not all developers may realize that the tool must be re-run, especially in the case of extremely minor changes to the schema.
Wouldn’t it be better to have a tighter integration among the XML schema, the generated classes, and the build environment? Fortunately, we can achieve this with a few simple steps, thanks to the MSBuild system for projects inside Visual Studio.
We’ll start from scratch here, so first things first: create a new project. Here I’ll just create a class library called "DeWinterXsd". Then add a new XML Schema file to the project. (You can find it under the Data category of items.)
Replace the contents of the schema file with the following snippet. It’s a very simple schema, featuring a root element and two child elements with three attributes among them.
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="SampleSchema"
targetNamespace="http://schemas.example.com/SampleSchema.xsd"
elementFormDefault="qualified"
xmlns="http://schemas.example.com/SampleSchema.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="first">
<xs:complexType>
<xs:attribute name="name" type="xs:token" />
<xs:attribute name="value" type="xs:token" />
</xs:complexType>
</xs:element>
<xs:element name="second">
<xs:complexType>
<xs:attribute name="origin" type="xs:token" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Now that you have the schema, add a new class to your project with the same name as your XML schema file. For example, the name of my schema file is "SampleSchema.xsd", and the name of the class is "SampleSchema.cs". This should nest the .cs file underneath the schema file, akin to how code-behind files are nested under UI files (e.g. Windows Forms, XAML). So how do we generate serialization classes? Right click on your project file and select "Unload Project".
And then right click on the project again and click "Edit xxx.csproj".
Now you’ll see the contents of the .csproj file itself. If you’re not familiar with the MSBuild system, then here’s the gist of it. Building a project in your solution executes a series of tasks, which are logically grouped into targets. Building will actually run only one target, Compile. However, the Compile target can depend on other targets in order to run correctly, so those targets are run before Compile, and so on. Take a look at your output window sometime while building; it will show you the tasks being executed by each target. For more information, read an MSBuild tutorial, like this one.
What we’ll do now is say that before the compilation target occurs, we want to generate serialization classes using xsd.exe. Scroll to the bottom of the file and you should see this:
<!– To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
–>
For now, we’ll uncomment the BeforeBuild target and insert a task to run xsd.exe. But since this is logically better off as its own target (e.g. GenerateSerializationClasses), we’ll specify that the BeforeBuild target depends on another target of our creation.
<Target Name="GenerateSerializationClasses">
</Target>
<Target Name="BeforeBuild" DependsOnTargets="GenerateSerializationClasses">
</Target>
I’ve placed the above lines right above the comments that discuss the BeforeBuild and AfterBuild targets ("To modify your build process, add your task…"). Now…to actually do something inside GenerateSerializationClasses requires the use of the Exec task, which as its name implies, will execute a task as if running it from the command line. Here we need to run xsd.exe, so we need to find where this tool is located. To do this easily, run "where xsd.exe" from a Visual Studio Command Prompt. My xsd.exe is in one of the Windows SDK directories, so I’ll add it to the Exec task like this:
<Target Name="GenerateSerializationClasses">
<Exec Command=""C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin\xsd.exe" SampleSchema.xsd /classes" />
</Target>
Straightforward, right? The Exec task tells MSBuild to run xsd.exe using the specified command parameters, which will generate serialization classes based off my XSD file (SampleSchema.xsd). If you reload and rebuild your project, then you’ll see the serialization classes appear in the .cs file you provided. But there’s one more thing we can do here first! It makes sense that we want to generate classes only when the schema file has actually changed, so we’ll need a few more attributes to the GenerateSerializationClasses target. The Inputs attribute specifies the inputs to consider for the target, and the Outputs attribute specifies the outputs to consider for the target. MSBuild will use this data to determine whether to run the target at all. Here, our logic is simple: if the schema file has been modified after the last modified time of the class file, then run the target, and that’s exactly what MSBuild will do. The final XML snippet:
<Target Name="GenerateSerializationClasses" Inputs="SampleSchema.xsd" Outputs="SampleSchema.cs">
<Exec Command=""C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin\xsd.exe" SampleSchema.xsd /classes" />
</Target>
<Target Name="BeforeBuild" DependsOnTargets="GenerateSerializationClasses">
</Target>
<!– To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
–>
Now, if you reload and rebuild your project, you can see the fruits of your labor:
We’ve generated the code, but it’s not the nicest output. Our classes have camel-cased names to match our element names, and the comments are non-existent for classes and publicly exposed members. Of course, we didn’t add any comments in our XML schema, but even if we used the xs:documentation element to document our elements and attributes, those comments would not be carried across to the generated code.
Next time, I’ll show you how to remedy these and other problems by building your own MSBuild task that can generate the same quality of code as xsd.exe but with a few custom enhancements.
Syndication
July 13th, 2008 at 9:28 pm
Good post. Thanks.
I’d like to ask a related question though - what is the recommended tool for generating serialization objects from a schema - xsd.exe or XSDObjGen.exe from Microsoft?
I know they generate vastly different code for the same given schema. Also, there have been times when one has worked and the other has not even when using similar command-line options. I would assume that the recommended tool to use is “xsd.exe” since it is part of the Framework SDK. However, if this was the case, why did MS ever develop “xsdobjgen.exe”?
TIA
July 14th, 2008 at 4:56 pm
From what I’ve seen, XSDObjGen was an alternative to xsd.exe around the time of .NET 1.1; apparently, xsd.exe couldn’t handle some schema constructs very well.
I tried to use XSDObjGen, but I didn’t want to install .NET 1.1. I did, however, find an update for it called XSDClassGen, located here. I compared the outputs of xsd.exe with the output of XSDClassGen.exe. Here are the differences I found, in order of the magnitude of their effect on the consumer:
1. XSDClassGen will construct the XML from public fields, not properties (although properties are generated). It does place an EditorBrowsableAttribute with EditorBrowsableState.Never on these fields, but you can still see them from the IntelliSense window if the generated code is in the same solution as the code you’re accessing it from.
2. XSDClassGen will generate collection constructs as List<T>, not a T[].
3. XSDClassGen can expose value types in the XSD as properties/fields of type Nullable<T>.
4. XSDClassGen can lazily initialize the fields when accessed through properties.
5. XSDClassGen does not place the DebuggerStepThroughAttribute on the generated classes.
6. XSDClassGen won’t put a CompilerGeneratedAttribute on your classes. This can be important for people who don’t want Code Analysis to analyze generated code.
Based on this, my choice for a “recommended tool” is xsd.exe, as it is officially supported by Microsoft and has probably been improved to account for cases where xsdobjgen.exe was useful. The fact that it was very difficult to find any up-to-date information on XSDObjGen.exe supports my choice as well.
July 23rd, 2008 at 10:02 am
Hello, David. Thanks so much for this post. As someone who is very new to Microsoft software development, .NET, C# and Visual Studio, and somewhat less new to XSD, I greatly appreciated this extremely helpful description.
I was wondering whether you had any advice for using XSD “include” in the scenario you describe above. In particular, I have a need to develop multiple schemata that share some common definitions. So, I’ve put the common defs into a separate file, which I then include into the top-level schemata.
I’m not sure exactly how to organize my projects (one class library, or multiple? if multiple, where to put the shared definitions, etc.), nor even how to get XSD include to work properly with xsd.exe.
Any additional advice you would have along these lines would be appreciated!
-Lane
July 28th, 2008 at 8:10 pm
Hi Lane,
Thanks for the compliments. Here are some thoughts.
xsd.exe does respect using xsd:include and will essentially merge your schemas into one large schema file before generating code from them. If you have one schema that includes all of the others (e.g. schema A includes schemas B, C, and D only), then you can run xsd.exe on that one and achieve the same results as laid out in this post. However…
If you have multiple schemas that include common schemas (e.g. schema A includes schemas B and C, and schema D includes schema B), then it will be a bit tougher for xsd.exe to generate classes. If you tell xsd.exe to run on schemas A and D, then the types in B will generated twice, which is obviously not something you want.
As a workaround, you can specify different namespaces for the classes generated by schemas A and D. (You can use the /namespace switch on xsd.exe to do this.) You can then write code to translate between the schema B types in the first namespace and the second namespace if needed. This isn’t really an ideal solution though.
Another workaround is to build your own version of xsd.exe which can ignore generating certain types. The end result would be that running this modified xsd.exe on schema A would yield types in schema B, but running it on schema D would not. This is difficult but will be the subject of part 2 of this series, so stay tuned!
David
August 28th, 2008 at 9:24 am
Does anyone know where to download this XSD2ClassGen, because the link above points to page who’s download link is broken. Anyone have this laying around that can post it to the web ?