Parsing XML

Aug 16, 2013 at 12:40 AM
Can we have a grammar for an XML parser in which tags can be paired and text & other tags can be inserted inside the two pairs?
I tried anyway I could think of but there's always a shift/reduce conflict.
Coordinator
Aug 16, 2013 at 5:44 AM
The fact is that the XML grammar (see http://www.w3.org/TR/REC-xml/) is not a context-free grammar, which is a prerequisite for the large majority of the parser generators. In essence you can't match pairs of tags with the same name, unless the particular name is directly specified within the grammar. Some tools may provide tricks for that, such as back references in regular expressions (which make not regular anymore).

What you can do is match closing and ending tags regardless of their names. This will match a superset of XML and you will have to manually checks for the additional constraints after the parsing phase. However I can't bu feel this is not a great idea. There already are lots of very efficient specialized XML parsers. So unless you want to somehow mix XML with a new language, I would suggest using one of these.