--------------------------------------------------------------- XMLIO: a C++ XML input/output library Copyright (c) 2000 Paul T. Miller LGPL DISCLAIMER This library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. http://www.gnu.org/copyleft/lgpl.html --------------------------------------------------------------- Modification History v0.1 PTM December 11, 1999 Basic support for ASCII XML callback-based parsing of hierarchical XML data v0.2 PTM December 16, 1999 Added intrinsic-data-type handlers Int, UnsignedInt, Float, Double, Bool, CString, List v0.3 PTM December 19, 1999 Reimplemented over expat using an event-queue model v0.4 PTM December 22, 1999 Removed expat and went back to a lighter-weight implementation v0.5 PTM January 1, 2000 Reimplemented in C with the C++ API layered over it Renamed to XML::Input Added handler chaining Switched to LPGL Added C version of sample objects and read test Added bool and list tests Moved sample code to subdirectory v0.6 PTM January 8, 2000 Added data (push access), comment, and CDATA handlers Added dump.c which copies one XML file to an output file and times it Added convert.cpp which uses XML::Input and XML::Output to mirror a file v0.7 PTM January 14, 2000 Fixed a bug which prevented empty elements from working Added accessors to determine if an element is empty Skip DOCTYPE chunks v0.8 PTM March 2, 2000 Fixed a bug in Data handlers causing the handler to be called too often Support single-quoted attributes (sorry about that) Removed anonymous struct in Element union - should make other compilers happy v0.9 PTM October 3, 2000 Removed anonymous union which kept it from compiling with GCC (oops). v0.91 PTM October 25, 2000 Fixed warnings about forward-declared structure typedefs v0.92 PTM January 8, 2001 Incorporated compiler warning fixes by Matthew Wall I want to clean up the std::vector and std::string stuff better for compilers that have namespaces disabled, but take a look at Matthew's changes to xmloutput in the contrib directory if you need to. --------------------------------------------------------------- XMLIO is a package to help read and write application data stored in XML format from within C or C++ applications. There are two main components: XML::Output - functions for writing nested elements and attributes XML::Input - nestable, streaming XML processor BACKGROUND I wrote this library because I wanted to store application data using a well-documented syntax (why invent my own when one already exists?), but I also wanted to make it trivial to generate and parse these files. Existing XML parsers tend to fall into two camps: event parsers DOM parsers Event parsers are too general purpose for what I wanted, and DOM parsers have to load the entire document into memory. What I needed was a streamable, event-based parser that allowed DOM-like processing of sub-trees in a nestable fasion. The interface to the parser is based somewhat on a parser for a proprietary hierarchical data-file syntax I created several years ago. It works well with object hierarchies and allows encapsulating the parsing of element subtrees by individual objects. Like an event parser, objects can register handlers for various elements, but it is trivial for one object to parse sub-objects, each parsing their own sub-objects with their own element handlers. Handlers for intrinsic data-types such as integers, booleans, floats, strings, and lists are also provided. Furthermore, a subclass can easily attach its own list of handlers to a chain of handlers defined by its parent class. USAGE The easiest way to see how the parser works is to look at a sample application. I've written a set of test "objects" that can read and write their own data. The implementation can be found in sample.cpp and sample.h. A vanilla C implementation of the equivalent functionality can be found in sample_c.c and sample_c.h. There are two sets of driver applications. testwrite.cpp (and the C version, testwrite_c.c) build an in-memory object model and then write the data to an XML file. testread.cpp (and the C version, testread_c.c) will read the XML file back in, constructing the in-memory object model using various element handlers. API documentation can be found in xmlinput.h and xmloutput.h. HANDLERS There are several types of built-in handlers. The C API uses macros to initialize the handlers (which contain unions), and generally uses object member offset and size information to store data for basic types. The C++ API uses XML::Handler() objects with proper type-checking, and can use pointers to destination addresses in addition to the object offset/size information. * Element This handler has two functions: 1. to look for a specific element and call a user-supplied callback function 2. to call a user-supplied callback function for any unhandled element This is the most common handler for dealing with nested objects. If userData is specified in the handler, that data will be passed to the callback. Otherwise, the userData passed to the element processor function will be used. C interface XML_ELEMENT_HANDLER_MEMBER(name, callback, object, member) XML_ELEMENT_HANDLER(name, callback, userData) name is the element name, or NULL to handle any element callback is the function to call of type XML_HandlerProc (see xmlinput.h) object, member is the struct and member name of the address you want passed as userData to the callback - this is useful if you want to branch to a handler for a child structure (see Point_Read and relevant code in sample_c.c). Note that you need to pass the address of the struct containing the child structure in the element process function. userData is the userData to pass to the callback, if it is static C++ interface XML::Handler(name, callback, userData) XML::Handler(name, callback, object, member) XML::Handler(callback, userData) XML::Handler(callback, object, member) name is the element name, and is required (use the alternate form that does not require a name for a generic handler) callback is the function to call of type XML::HandlerProc (see xmlinput.h) object, member is the class and member name of the address you want passed as userData to the callback - this is useful if you want to branch to a handler for a child class userData is the userData to pass to the callback - note that this can be the address of a class member if the handler list is not initialized statically. If the handler list is static, the object/member interface should be used While inside the handler callback function, a new list of handlers can be processed, or custom character data can be read, but not both. * Int Parse an integer from the element data and store it at the specified address, with optional range clamping C interface XML_INT_HANDLER(name, object, member, minVal, maxVal) C++ interface XML::Handler(name, value, minVal=0, maxVal=0) value is the int ADDRESS to place the result If minVal OR maxVal is not zero, then range clamping is performed on the parsed value. * UInt Parse an unsigned integer from the element data and store it at the specified address, with optional range clamping C interface XML_UINT_HANDLER(name, object, member, minVal, maxVal) C++ interface XML::Handler(name, value, minVal=0, maxVal=0) value is the unsigned int ADDRESS to place the result If minVal OR maxVal is not zero, then range clamping is performed on the parsed value. * Float Parse a float from the element data and store it at the specified address, with optional range clamping C interface XML_FLOAT_HANDLER(name, object, member, minVal, maxVal) object/member should be sizeof(float) bytes C++ interface XML::Handler(name, value, minVal=0, maxVal=0) value is the float ADDRESS to place the result If minVal OR maxVal is not zero, then range clamping is performed on the parsed value. * Double Parse a double from the element data and store it at the specified address, with optional range clamping C interface XML_DOUBLE_HANDLER(name, object, member, minVal, maxVal) object/member should be sizeof(double) bytes C++ interface XML::Handler(name, value, minVal=NULL, maxVal=NULL) value is the double ADDRESS to place the result NOTE: minVal and maxVal are POINTERS to the minimum and maximum double values to use. If minVal is not NULL, then clamping will be performed on the lower bound. If maxVal is not NULL, then clamping will be performed on the upper bound. * Bool Parse a boolean value from the element data and store it at the specified address. The element data should be either "True" or "False" (without quotes). If neither is found, then False is assumed. C interface XML_BOOL_HANDLER(name, object, member) object/member should be sizeof(char), sizeof(short), or sizeof(int) C++ interface XML::Handler(name, value) value is the bool ADDRESS to place the result * CString Parse a string from the element data, up to the ending element tag. No brackets can be found in the string, unless they are escaped. The string is always NULL-terminated C interface XML_STRING_HANDLER(name, object, member, maxLen) object/member should point to a string containing at least maxLen characters (sizeof(XML_Char) bytes in size) C++ interface XML::Handler(name, value, maxLen) value is the address of a string containing at least maxLen characters (sizeof(XML::Char) bytes in size) * List Parse a string from the element data and compare against an array of pointers to NULL-terminated strings specified in the handler data, storing the resulting index at the specified location C interface XML_LIST_HANDLER(name, object, member, list, listSize) object/member should be sizeof(char), sizeof(short), or sizeof(int) list is a pointer to an array of pointers to NULL-terminated strings listSize is the number of strings in the array C++ interface XML::Handler(name, value, list, listSize) value is the int ADDRESS to place the result list and listSize are the same for the C interface If no match is found, no value is stored and no error is raised. * Chain Hook another handler list into this list, allowing handler lists to be chained together up an object hierarchy. C interface XML_CHAIN_HANDLER(handlers, userData) XML_CHAIN_HANDLER_MEMBER(handlers, object, member) handlers is a const array of XML_Handler structures object, member is the struct and member name of the address you want passed as the base address for the chained handlers userData is the base address to use for the chained handlers C++ interface XML::Handler(handlers, userData) handlers is a const array of XML::Handler objects userData is the base address to use for the chained handlers Passing the correct base address to use for the chained handlers is very important. For example, if a struct contains a member structure for which there is a handler chain available to process, the address of the member structure should be passed as the userData in the handler specification for the chain. If the chain is for the members of a base class (with the same starting address as the current object), then NULL can be passed and the userData passed to the process function will be used (which is usually the object base address). * Data This handler gets called with the element data as a push-type interface. The prototype of the handler function is: XML_Error DataProc(const XML_Char *data, size_t len, void *userData) Since this function will be called with "chunks" of data, len will be 0 after all of the data has been sent. This provides a way to clean up once all the data has been received. * CData This handler gets called with CDATA as a push-type interface. The operation is the same as with a normal Data handler. * Comment This handler gets called with comments as a push-type interface. The operation is the same as with a normal Data handler. IMPLEMENTATION XML::Input allocates no memory while processing. This is the primary reason I did not implement it over expat or another parser. Because of this, reading and processing XML data is extremely time and memory efficient. The C++ implementation is built on top of the C implementation, with three major differences: 1. the API is in the XML namespace instead of using the XML_ prefix 2. element handlers take a "XML::Element &" instead of a "XML_Element *" 3. exceptions are used instead of error codes BUGS * I am sure there are bugs, but it "works for me" as it is now. * only ASCII encoding right now * no namespaces * no PI handlers AUTHOR Send me questions, comments, and bug reports at: paul@fxtech.com