Spark & XML
XML is another famous document type which is worth exploring when using Spark. It is used in various systems and is supposed to be “human readable” – though I doubt when you look at some really big XML documents. But having said that it is still possible to read, parse and understand an XML document in Spark. Though spark does not have native support for XML as it does for JSON – things are not all that bad. There is a library available to parse XML documents provided by databricks called Spark-XML and is actively maintained by them. Reading XML documents To make it easier to understand how to read XML documents, this blog post is divided into two parts Simple XML documents Nested XML documents Before we can read any XML documents we need to include the spark-xml library to our intelliJ development environment. Add the following line to build.sbt … Read more