October 16th, 2002:

PHP’s xml_parse_into_struct is broken

PHP‘s XML function, xml_parse_into_struct is causing me some trouble. I use it in my xmlParser class to break the document down into a basic struct, then I parse that to create the desired struct as it is easier than using all of the expat API to register functions to handle all of the incoming data. Unfortunately, I believe there is a bug in it. It seems to decode the core XML entities automatically. This is slightly undesired since it has no means, as far as I can see, of informing it of additional entities that need to be decoded, I’d rather leave all of the decoding for a later step. Unfortunately, it gets even worse. For those entities that are not in the XML core, it merely throws them away.

Thierry Malo (sorry, I don’t have his website URL) informed me of this when he attempted to use feedParser (which uses xmlParser) on a feed with international character entities in some of the titles.

Since the expat parser is non-validating, I don’t expect it to pick up all the entities and decode them, however, one would expect that it would either allow you to inform it of additional entities, or not decode any entities at all.

I guess I’ll have to rewrite xmlParser to not use xml_parse_into_struct().