When working with data, it often comes in
an XML format. So we have to serialize and deserialize it in order to use it.
There are several ways of doing that – for example: DOM, XQuery, XSLT. DOM is
the oldest from the three, but still can do the work. XQuery and XSLT are not
very easy to use and require some time to master. In .NET 3.5 a big programming
model improvement was made with the LINQ - Language-Integrated Query. It can be
used for objects, databases and XML.
Introduction
LINQ to XML allows us to create, read and
edit XML files and what’s more important - it’s done in a very easy and
understandable way. To use LINQ to XML you should add a reference to the
System.Xml.Linq.dll (for the XDocument and the other classes) and use the
System.Linq namespace (for the LINQ syntax).
Important methods of XElement and XDocument
- Add(object content) – adds the new content as a child of the element
- Remove() – Remove this element from its parent
- Descendents( XName name ) – returns a collection of all descendents of this element, which names match the argument, in document order
- Element( XName name ) – returns the first child that has a matching name
- Elements( XName name ) - returns a collection of all children of this element, which name matches the argument, in document order
- Nodes() – returns a collection of all children of the current element in document order
Creating an XML file
So let’s have a simple class Person:
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public Location Address { get; set; }
}
public class Location
{
public string Country { get; set; }
public string City { get; set; }
}
And we create an
object of type Person:
Person p1 = new Person()
{
FirstName = "Martin",
LastName = "Mihaylov",
Address = new Location()
{
City = "Sofia",
Country = "Bulgaria"
}
};
Now let’s try to create an XML from it. For
that purpose we use XElement and XAttribute objects:
XElement persons =
new XElement( "persons",
new XElement( "person",
new XElement( "firstName", p1.FirstName ),
new XElement( "lastName", p1.LastName ),
new XElement( "address",
new XAttribute( "city", p1.Address.City ),
new XAttribute( "country", p1.Address.Country ) ) ) );
We simply create an element “persons” using
the XElement object and then nest other elements in it. We can also create
properties for the elements thanks to the XAttribute object. We can also use
the XDeclaration object to define our xml document and XComment to add a
comment to the xml document. Here it is:
XDocument myXml = new XDocument( new XDeclaration( "1.0", "utf-8", "yes" ),
new XComment( "A Comment in the XML" ), persons );
So the final output should look like this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- A Comment in the XML -->
<persons>
<person>
<firstName>Martin</firstName>
<lastName>Mihaylov</lastName>
<address city="Sofia" country="Bulgaria" />
</person>
</persons>
Adding an element to the XDocument
First we find the element we want to add
something to and then we use its Add method to add our new element. Here is an
example:
myXml.Element( "persons" ).Add( new XElement( "person",
new XElement( "firstName", p2.FirstName ),
new XElement( "lastName", p2.LastName ),
new XElement( "address",
new XAttribute( "city", p2.Address.City ),
new XAttribute( "country", p2.Address.Country ) ) ) );
Removing an element form the XDocument
To remove an element or attribute you must
navigate to the desired element and then call its Remove method:
myXml.Element( "persons" ).Element( "person" ).Remove();
This will remove the first element with
name “person” in “persons”.
Reading an XML file
Before reading you should load your XML
file to an XElement or XDocument object. This can be done with the Load method.
You can input from string, from TextReader, from XMLReader and of course from
file. Here is an Example:
XDocument myXML = XDocument.Load( "MyXML.xml" );
Now let’s try to read the contents of an
XML file. For this example we use the XML string we’ve already created in the
beginning of the article. Thanks to LINQ we can use the standard query
operators: from, in, select. Because of that to take the information you need
from an XML file becomes fairly easy:
List<Person> personsList =
( from person in myXml.Descendants( "person" )
where (( string )person.Element( "address" ).Attribute( "country" )).Equals( "Bulgaria" )
select new Person()
{
FirstName = person.Element( "firstName" ).Value,
LastName = person.Element( "lastName" ).Value,
Address = new Location()
{
City = person.Element( "address" ).Attribute( "city" ).Value,
Country = person.Element( "address" ).Attribute( "country" ).Value
}
} ).ToList();
The Descendants method returns all child
elements that have name “person” (in our case). Then from each descendent that
has an "address" element with "country" property set to
"Bulgaria"
we create a new object of type Person and set its properties. The output is a
list of objects.
Query your data
Notice how it uses the XDocument.Descendants() method. That method looks
through the XDocument and all of its child nodes - the descendants - and
returns them in document order. When you pass it a name, it filters the list.
One important thing to keep in mind about XDocument.Descendants() is that is
uses deferred execution. That means it returns a sequence (an IEnmerable, to be
specific), but it doesn't actually descend through the XML document and find
all of the descendants until its iterator is executed. If you use a
foreach loop to iterate through the descendants, each iteration only reads to
the next descendant.
Using LINQ to read XML data from an RSS feed
You can do some pretty powerful things with
LINQ to XML, because so much data is stored and transmitted as XML. Like RSS
feeds, for example! Open up any RSS feed - like this one from our blog,
Building Better Software - and view its source, and you'll see XML data. And
that means you can read it into an XDocument and query it with LINQ.
One nice thing about the XDocument.Load()
method is that when you pass it a string, you're giving it a URI. A lot of the
time, you'll just pass it a simple filename. But a URL will work equally well.
Here's how you can read the title of a blog from its RSS feed, using the
<rss>, <channel>, and <title> tags:
XDocument ourBlog = XDocument.Load("http://www.stellman-greene.com/feed");
Console.WriteLine(ourBlog.Element("rss").Element("channel").Element("title").Value);
That means it's easy to write a LINQ to XML
query to read data from an RSS feed. Here's how we'll do it:
1.
Create a new console application
2.
Make sure you've got using System.Xml.Linq; at the top
of the code
3.
We'll use XDocument.Load() to load the XML data from the URL.
4.
A simple LINQ query can extract the articles into instances of a
Post class that we'll create
5.
Instead of using anonymous types, the select new clause
will select new Post objects
When you use the XDocument.Element()
method, you're really calling the Element() method of its base
class, XContainer.
The XElement class that use used earlier also extends XContainer, and the
Element() method returns an XContainer.
We'll take advantage of that by creating a
Post class with a constructor that takes an XContainer object and uses its
Element() method to get values. Note its GetElementValue() method that either
returns an element's Value or, if that element doesn't exist, returns an empty
string. (Again, remember to add using System.Xml.Linq; to the top of
the code, for both this and the Main() method
below!)
class Post
{
public string Title { get; private set; }
public DateTime? Date { get; private set; }
public string Url { get; private set; }
public string Description { get; private set; }
public string Creator { get; private set; }
public string Content { get; private set; }
private static string GetElementValue(XContainer element, string name)
{
if ((element == null) || (element.Element(name) == null))
return String.Empty;
return element.Element(name).Value;
}
public Post(XContainer post)
{
// Get the string properties from the post's element values
Title = GetElementValue(post, "title");
Url = GetElementValue(post, "guid");
Description = GetElementValue(post, "description");
Creator = GetElementValue(post,
"{http://purl.org/dc/elements/1.1/}creator");
Content = GetElementValue(post,
"{http://purl.org/dc/elements/1.0/modules/content}encoded");
// The Date property is a nullable DateTime? if the pubDate element
// can't be parsed into a valid date,the Date property is set to null
DateTime result;
if (DateTime.TryParse(GetElementValue(post, "pubDate"), out result))
Date = (DateTime?)result;
}
public override string ToString()
{
return String.Format("{0} by {1}", Title ?? "no title", Creator ?? "Unknown");
}
}
Did you notice how the Post constructor
passes uses "{http://purl.org/dc/elements/1.1/}creator" as
the name for creator? If you go back to the RSS feed source and search for
"creator", you'll find a tag that looks like this:
<dc:creator>Andrew
Stellman</dc:creator>
See that "dc:"? At the top of the
post, the tag has this attribute:
xmlns:dc="http://purl.org/dc/elements/1.1/"
{http://purl.org/dc/elements/1.1/}creator
Now you're ready for the LINQ query. Notice
how it uses select new Post(post) to pass each XElement returned by ourBlog.Descendants("item")
into the Post constructor.
static void Main(string[] args)
{
// Load the blog posts and print the title of the blog
XDocument ourBlog = XDocument.Load("http://www.stellman-greene.com/feed");
Console.WriteLine(ourBlog
.Element("rss")
.Element("channel")
.Element("title")
.Value);
// Query <item>s in the XML RSS data and select each one into a new Post()
IEnumerable<Post> posts =
from post in ourBlog.Descendants("item")
select new Post(post);
// Print each post to the console
foreach (var post in posts)
Console.WriteLine(post.ToString());
}
When you run your program, it connects to
the blog, retrieves the RSS feed, and prints the list of articles to the
console.
No comments:
Post a Comment