Handling flat XML files — fxml_importXMLFlat • flatxml

Reads an XML document into a flat dataframe structure.

fxml_importXMLFlat(path)

Arguments

path	Path to the XML document. Can be either a local path or a URL.

Value

A dataframe containing the XML document in a flat structure. See the Details section for more information on its structure.

Details

The XML document is parsed and stored in a dataframe structure (flat XML). The first four columns of a flat XML dataframe are standard columns. Their names all end with a dot. These columns are:

elem.: The element identifier of the current XML element (without the tag delimiters < and >).
elemid.: A unique, ascending numerical ID for each XML element. The first XML element is assigned 1 as its ID. This ID is used by many of the flatxml functions.
attr.: Name of an attribute. For each attribute of an XML element the dataframe will have an additional row.
value.: The value of either the attribute (if attr. is not NA) or the element itself (if attr. is NA). value. is NA, if the element has no value.

The columns after these four standard columns represent the 'path' to the current element, starting from the root element of the XML document in column 5 all the way down to the current element. The number of columns of the dataframe is therefore determined by the depth of the hierarchical structure of the XML document. In this dataframe representation, the hierarchical structure of the XML document becomes very easy to understand. All flatxml functions work with this flat XML dataframe.

If an XML element has N attributes it is represented by (N+1) rows in the flat XML dataframe: one row for the value (with dataframe$value. being NA if the element has no value) and one for each attribute. In the attribute rows, the names of the attributes are stored in the attr. field, their respecitive values in the value. field. Even if there are multiple rows for one XML element, the elem. and elemid. fields still have the same value in all rows (because the rows belong to the same XML element).

Examples

# Load example file with population data from United Nations Statistics Division
example <- system.file("worldpopulation.xml", package="flatxml")
# Create flat dataframe from XML
xml.dataframe <- fxml_importXMLFlat(example)