Spark & Ad-hoc Querying
Till now we have written programs which are essentially compiled jars & running on some Spark cluster. This works fine if the use case is batch processing data and that data is being consumed by some downstream applications. But think about if we(as users) want to use the raw computing power of Spark to do some ad-hoc SQL query. Aah, the good old world of an SQL command line or an SQL editor. Firing a bunch of queries. What about JDBC connections to some BI applications? Well, look no further – all that is possible!! In this blog post, we will look at how multiple users can interface with Spark to do Ad-hoc querying using Spark Thrift Server by creating a JDBC connection to it and fire some queries on data stored in Hive. Before we start on this let’s see what all is required to get going Access to a … Read more