Spark & Amazon S3
Introduction Till now we have only concentrated on reading data from local file systems. Which may be fine in some use case but does not apply to big data and/or cloud-based environments. Everyone knows about Amazon Web Services and the 100s of services it offers. One of its earliest and most used services is Simple Storage Service or simply S3. You can read more S3 on this link In this blog, entry we try to see how to develop Spark based application which reads and/or writes to AWS S3. It can then later be deployed on the AWS cloud. But before we do that we need to write a program that works. Before we begin, there are a couple of assumptions here – Understand the basics of AWS Identity & Access Management – like creating a user, access key and secret access key. If not check this link Understand how … Read more