Thursday, April 17, 2008

Simplifying libxml

As I mentioned in my previous post, XML handling on the iPhone is through the open source libxml library, which is a procedural C-based API. We can also use libxml in Cocoa, and if you have an eye toward re-using your code on the iPhone, it's probably not a bad idea to use libxml instead of NSXML. It also means I can show you how to use libxml without using the iPhone SDK and violating the NDA. 

To use libxml in your Cocoa projects, you just need to do one thing in your Xcode project settings, which is to add /usr/include/libxml2/** to your Header Search Paths build settings:
This will tell your project where to find the header files for libxml.

Now, to use libxml to parse XML data contained in an instance of NSString, say we had:
NSString *xml; // string containing XML
We need to make sure we import the necessary header file:
#import <libxml/xmlmemory.h>
Next, we have to tell the library to parse this data. Being a procedural C library, libxml knows nothing about NSString, which is an objective-C class class cluster, so we have to convert our NSString into a c string, like this:
xmlDocPtr doc = xmlParseMemory([xml UTF8String], [xml lengthOfBytesUsingEncoding:NSUTF8StringEncoding]);
If this was successful doc will not be NULL. Now, how do we get values from this? Well, we can get the root node like this:
xmlNodePtr root = xmlDocGetRootElement(doc);
xmlNodes are implemented as doubly linked lists, a construct that we don't use much in Objective-C, although it's likely used under the hood in the implementation of some of the collection classes. This abstraction is different from how we commonly work in Objective-C. Ordinarily, we have an object that represents the collection and we call methods on that collection object to get to the objects that it contains. In an old-school linked list like this, the xmlNode object pointed at by an xmlNodePtr represents both the node itself and the collection. Now, there is only going to be one root node in most situations but let's look at how we would get to the children of the root node.

It's actually pretty easy. We declare another node pointer and point it at the nodes's children like so:
xmlNodePtr node = root->children;
Now, the children pointer gives us a single node, but that node is also our access to all of the node's siblings and children. To iterate through all the nodes at this level, we can loop like this:
xmlNodePtr cur_node;
for (cur_node = node; cur_node; cur_node = cur_node->next)
{
 // Do something
}
This loop keeps going until it gets to the last item in the linked list. The last item has NULL as its pointer to next, so the loop stops. Similarly, we can loop through a node's children like this:
xmlNodePtr children;
for (children = node->children; children; children = children->next)
{
 // Do something
}
To get a node's name, we just look at the name member of the xmlNode struct, which is a c string. So, if we want to find a node with a specific name among a node's children, we would do it like this:
NSString *nameToSearchFor = @"Id";
xmlNode *child = NULL;
for (child = node->children; child; child = child->next)
{
if (strcmp((char *)child->name, [aName cStringUsingEncoding:NSUTF8StringEncoding])==0)
{
// Do something with the node
}
}
To determine the value of a node - the value between the begin and end tab in your XML like this: 
<node>value</node>
It's a little more complicated, but not much. To obtain its value as a string:
xmlChar *ret = xmlNodeListGetString(doc, node->children, 1);
Notice that we pass not the node, but the node's children pointer. This is because the text between the begin and end tags counts as a child.

At this point, I think you can see how mixing a procedural C API with Objective-C code looks a little ugly. On the other hand, I believe that Apple must have had good reason to not port NSXML to the iPhone, probably having to do with performance, but that is just conjecture on my part. So, how can we make our code look nicer without imposing significant additional overhead? 

Well, we can create a very low-overhead Objective-C wrapper. In the internals of the class, we use libxml for performance, but then convert to and from Objective-C objects in our accessors and mutators. This gives us a nice compromise between performance and readability, so our code can look like this without adding significant processing overhead:
 NKDLibXMLDocument *doc = [NKDLibXMLDocument documentWithRawXML:result];
NKDLibXMLNode *root = [doc rootNode];
NKDLibXMLNode *user = [root childNamed:@"User"];
NSLog(@"Id: %@", [user valueForChildNamed:@"Id"]);
NSLog(@"FirstName: %@", [user valueForChildNamed:@"FirstName"]);
NSLog(@"LastName: %@", [user valueForChildNamed:@"LastName"]);
Doesn't that look nicer? Isn't it easy to tell what's going on in that code? We hide all the nastiness away in a couple of classes and then never have to deal with it again. Yeah, that's the ticket.

I've still got some work to do on on the wrapper class, but I'll post it in the next day or two.



26 comments:

LiquidIce said...

Thank you for posting this, it has been very helpful.

Brian said...

Wow, this is so helpful! Whenever you can post the wrapper, I'd be very grateful!

Brian said...

I found a small thing about your code:

You listed a string to find a node, but used a different string in the loops. For implementation, I renamed both to aName...

I'm having trouble getting the value of a node (can't include an example, they don't like the tags). I'm trying to use xmlNodeListGetString, but with little success. None, in fact.

Jeff LaMarche said...

Brian:

Which code snippet is wrong? Let me know and I'll correct it.

Thanks,
Jef

Brian said...

I'm pretty sure that you have a string named nameToSearchFor, but then you use aName in the if statement. I tried renaming the first string to aName, and I think it worked... Pretty sure both strings are supposed to be the same, no matter what the name is.

NSString *nameToSearchFor = @"Id";
xmlNode *child = NULL;
for (child = node->children; child; child = child->next)
{
if (strcmp((char *)child->name, [aName cStringUsingEncoding:NSUTF8StringEncoding])==0)
{
// Do something with the node
}
}

Not sure, I could be missing something...

Brian said...

I've actually figured out how to use the xml parsing technique used in SeismicXML (the sample code provided on the Apple Dev site).

Jimmy "Turin" Reza said...

i get a syntax error on the first line

xmlDocPtr doc = xmlParseMemory([xml UTF8String], [xml lengthOfBytesUsingEncoding:NSUTF8StringEncoding]);

is there something else i need to do besides the search paths?

Jimmy "Turin" Reza said...

i get syntax error before token "="
on these types of lines

xmlNode *child = NULL;

is there something i am missing?

Jimmy "Turin" Reza said...

got further.. put the code in AppDelegate.m file but this time i get _xmlParseMemory, refrenced from
.. applicationDidFinishLaunching...
symbols not found..
collect2: id returned 1 exit status

error..
any ideas

antonio valverde said...

i need you help, how read XML from dictionary, the result is return into
dictionary and not into NSString for
read it with libxml

thanks

Ethan Vizitei said...

Thanks for the post, jeff, this was very helpful.

seo expert said...

nice post

Yaghiyah said...

Hi all
xmlDocPtr doc = xmlParseMemory([xml UTF8String], [xml lengthOfBytesUsingEncoding:NSUTF8StringEncoding]);


I get a linker error on the first line saying

"_xmlParseMemory", referenced from:



Can someone please help me ?
P.S Im new to the xcode ide.

Thanks in advance.

Yaghiyah said...

Fixed the error here is the solution
Added "-lxml2" in the linker flags setting found in the project settings
section

Project now complies successfully.

hope this helps anyone.

Rob M said...

Thanks for the -lxml2 hint. It worked for me, too and saved some time.

jaz said...
This comment has been removed by the author.
jaz said...

Thanks this is very helpful.

Did you ever write the wrapper you talk about in the post.

jaz said...

I found the code at google to be very helpful, it handles .zips with multiple files: http://code.google.com/p/ziparchive/

Ryan said...

That linker flag info was very helpful. Thanks!

Ross said...

I can verify that the linker flag hint works. Thanks for posting this, this was exactly what I was looking for!

Edwin said...

scrub m65 kamagra attorney lawyer body scrub field jacket lovegra marijuana attorney injury lawyer

AndyDORK said...

Hpple seems to have good promise..

There is a very nice, detailed, tutorial at: http://stackoverflow.com/questions/405749/parsing-html-on-the-iphone

JeansPilot said...

JeansPilot offers the chance to buy a large variety of men’s and women’s jeans clothing from the world famous Italian Brands.
Online jeans clothing store looks for original fashion clothing sales and clearances of worldwide known designers. We participate in fashion auctions to get the lowest possible price for Top quality Clothes, Shoes and Accessories.
Buy Jeans

Bond 007 said...

where is method ?

/*
Disable caching so that each time we run this app we are starting with a clean slate. You may not want to do this in your application.
*/
- (NSCachedURLResponse *)connection:(NSURLConnection *)connection willCacheResponse:(NSCachedURLResponse *)cachedResponse {
// Forward errors to the delegate.
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
// Called when a chunk of data has been downloaded.
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {

- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
- (void)appendCharacters:(const char *)charactersFound length:(NSInteger)length {

/*
This callback is invoked when the parse reaches the end of a node. At that point we finish processing that node,
if it is of interest to us. For "item" nodes, that means we have completed parsing a Song object. We pass the song
to a method in the superclass which will eventually deliver it to the delegate. For the other nodes we
care about, this means we have all the character data. The next step is to create an NSString using the buffer
contents and store that with the current Song object.
*/
static void endElementSAX(void *ctx, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI) {
/*
This callback is invoked when the parser encounters character data inside a node. The parser class determines how to use the character data.
*/
static void charactersFoundSAX(void *ctx, const xmlChar *ch, int len) {
/*
A production application should include robust error handling as part of its parsing implementation.
The specifics of how errors are handled depends on the application.
*/
static void errorEncounteredSAX(void *ctx, const char *msg, ...) {

Bond 007 said...

where this comment

/*
Disable caching so that each time we run this app we are starting with a clean slate. You may not want to do this in your application.
*/
- (NSCachedURLResponse *)connection:(NSURLConnection *)connection willCacheResponse:(NSCachedURLResponse *)cachedResponse {
// Forward errors to the delegate.
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
// Called when a chunk of data has been downloaded.
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {

- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
- (void)appendCharacters:(const char *)charactersFound length:(NSInteger)length {

/*
This callback is invoked when the parse reaches the end of a node. At that point we finish processing that node,
if it is of interest to us. For "item" nodes, that means we have completed parsing a Song object. We pass the song
to a method in the superclass which will eventually deliver it to the delegate. For the other nodes we
care about, this means we have all the character data. The next step is to create an NSString using the buffer
contents and store that with the current Song object.
*/
static void endElementSAX(void *ctx, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI) {
/*
This callback is invoked when the parser encounters character data inside a node. The parser class determines how to use the character data.
*/
static void charactersFoundSAX(void *ctx, const xmlChar *ch, int len) {
/*
A production application should include robust error handling as part of its parsing implementation.
The specifics of how errors are handled depends on the application.
*/
static void errorEncounteredSAX(void *ctx, const char *msg, ...) {

h4ns said...

What youre saying is completely true. I know that everybody must say the same thing, but I just think that you put it in a way that everyone can understand. I also love the images you put in here. They fit so well with what youre trying to say. Im sure youll reach so many people with what youve got to say.

Arsenal vs Huddersfield Town live streaming
Arsenal vs Huddersfield Town live streaming
Wolverhampton Wanderers vs Stoke City Live Streaming
Wolverhampton Wanderers vs Stoke City Live Streaming
Notts County vs Manchester City Live Streaming
Notts County vs Manchester City Live Streaming
Bologna vs AS Roma Live Streaming
Bologna vs AS Roma Live Streaming
Juventus vs Udinese Live Streaming
Juventus vs Udinese Live Streaming
Napoli vs Sampdoria Live Streaming
Napoli vs Sampdoria Live Streaming
Fulham vs Tottenham Hotspur Live Streaming
Fulham vs Tottenham Hotspur Live Streaming
AS Monaco vs Marseille Live Streaming
AS Monaco vs Marseille Live Streaming
Alajuelense vs Perez Zeledon Live Streaming
Alajuelense vs Perez Zeledon Live Streaming
Technology News | News Today | Live Streaming TV Channels