Converts an XML document to a dataframe.
fxml_toDataFrame( xmlflat.df, siblings.of, same.tag = TRUE, attr.only = NULL, attr.not = NULL, elem.or.attr = "elem", col.attr = "", include.fields = NULL, exclude.fields = NULL )
xmlflat.df | A flat XML dataframe created with |
---|---|
siblings.of | ID of one of the XML elements that contain the data records. All data records need to be on the same hierarchical level as the XML element with this ID. |
same.tag | If |
attr.only | A list of named vectors representing attribute/value combinations the data records must match.
The name of an element in the list is the XML element name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not | A list of vectors representing attribute/value combinations the XML elements must not match to be considered as data records. See argument |
elem.or.attr | Either |
col.attr | If |
include.fields | A character vector with the names of the fields that are to be included in the result dataframe. By default, all fields from the XML document are included. |
exclude.fields | A character vector with the names of the fields that should be excluded in the result dataframe. By default, no fields from the XML document are excluded. |
A dataframe with the data read in from the XML document.
Data that can be read in are either represented in this way:
<record>
<field1>Value of field1</field1>
<field2>Value of field2</field2>
<field3>Value of field3</field3>
</record>
...
In this case elem.or.attr
would need to be "elem"
because the field names of the data records (field1
, field2
, field3
) are the names of the elements.
Or, the XML data could also look like this:
<record>
<column name="field1">Value of field1</column>
<column name="field2">Value of field2</column>
<column name="field3">Value of field3</column>
</record>
...
Here, the names of the fields are attributes, so elem.or.attr
would need to be "attr"
and col.attr
would be set to
"name"
, so fxml_toDataframe()
knows where to look for the field/column names.
In any case, siblings.of
would be the ID (xmlflat.df$elemid.
) of one of the <record>
elements.
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Extract the data out of the XML document. The data records are on the same hierarchical level # as element with ID 3 (xml.dataframe$elemid. == 3). # The field names are given in the "name" attribute of the children elements of element no. 3 # and its siblings population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr", col.attr="name") # Exclude the "Value Footnote" field from the returned dataframe population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr", col.attr="name", exclude.fields=c("Value Footnote")) # Load example file with soccer world cup data (data from # https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html) # and create flat dataframe example2 <- system.file("soccer.xml", package="flatxml") xml.dataframe2 <- fxml_importXMLFlat(example2) # Extract the data out of the XML document. The data records are on the same hierarchical level # as element with ID 3 (xml.dataframe$elemid. == 3). #' # The field names are given as the name # of the children elements of element no. 3 and its siblings. worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")