SQL

Extracting data from XML in SQL SERVER

To simply put it, one can use a function "JSON.parse()" which helps to convert string to a JSON object and a function "parseFromString()" in the class "DOMParser" to convert a plain string to an XML object. In a similar fashion, we can parse a plain string to rowset provider just like a table or view in the database.

DECLARE @idoc int, @doc varchar(1000);  
SET @doc ='  
<ROOT>  
<Customer>
	<Name>Ajith Mohan</Name>  
	<Gender>Male</Gender>
	<Location>Mumbai</Location>
</Customer>  
<Customer>
	<Name>Anushree Mohan</Name>  
	<Gender>Female</Gender>
	<Location>Mumbai</Location>
</Customer>  
<Customer>
	<Name>Rajashri Mohan</Name>  
	<Gender>Female</Gender>
	<Location>Mumbai</Location>
</Customer> 
<Customer>
	<Name>Mohan Krishnan</Name>  
	<Gender>Female</Gender>
	<Location>Mumbai</Location>
</Customer> 
</ROOT>';  
--Create an internal representation of the XML document.  
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc;  
-- Execute a SELECT statement that uses the OPENXML rowset provider.  
SELECT    *  
FROM  OPENXML (@idoc, '/ROOT/Customer',2)  
      WITH (Name  varchar(50),  
            Gender varchar(10),
  		    Location varchar(10));
EXEC sp_xml_removedocument @idoc;  

The parsing of a plain string to rowsets in SQL Server involves two stages

  1. Create an internal representation of the XML document using sp_xml_preparedocument Procedure
  2. Execute a SELECT statement that uses the OPENXML rowset provider

1. Create an internal representation of the XML document

The procedure "sp_xml_preparedocument" parses the string and stores it in the internal cache of SQL Server using the MSXML parser, so if there is any formatting or syntactical errors it throws an error. The "sp_xml_preparedocument" procedure returns an ID as an output parameter, which can be used to point at the cached data.

A parsed document is stored in the internal cache of SQL Server. The MSXML parser uses one-eighth the total memory available for SQL Server. To avoid running out of memory, run sp_xml_removedocument to free up the memory.

2. Execute a SELECT statement that uses the OPENXML RowSet provider

Now that the data is available in the memory we can use the OPENXML to get a rowset view of the XML document stored in the internal cache. The OPENXML takes 3 arguments which are the id that was returned by the "sp_xml_preparedocument" procedure, rowpattern which is used to identify the node in the XML doc to be processed as rows and at last the flags which helps in mapping the XML data add the rowset, flags parameter is an optional parameter and has a default value of "0" which represents attribute-centric mapping, but in the above example, I have used element-centric mapping which is denoted by "2", which I have most widely encountered.

To know more about the mapping available please check the link.