Scala Collections – Streams, Maps & Sets

Streams Stream is similar to a List however, it can have unlimited elements. It is a Traversable but lazy in nature. A Stream in Scala consists of a head and tail which is lazily computed i.e elements are only evaluated when they are needed. Hence it can be inferred, laziness helps to make streams infinite. Transformer functions like map, filter etc are also applied lazily although care should be taken that not all functions are lazy – i.e max, sum etc. Elements can be added to a stream as below. Keep in mind that when you are adding an element to the stream we also need to append Stream.empty Additionally also checkout the #::: that is only required after the stream name and then it is #:: Transformer methods are lazily evaluated as show below – e.g filter Transformer methods are lazily evaluated as show below – e.g map Sets Set … Read more

Scala Collections – Lists, Sequences & Vectors

Below, I have provided some of the collections which are usually the most used. Starting from the regular ones like Sequences, Lists, Maps, Sets we go on to discuss some of the more interesting ones like Vectors, Streams. Lists List is probably the easiest to understand and hence most used data structure/collection. In scala a list is implemented as a class. I quote A class for immutable linked lists representing ordered collections of elements of type A By default, all lists are immutable. However the devil is in the detail. Check out the scala documentation here. Despite being an immutable collection, the implementation uses mutable state internally during construction. These state changes are invisible in single-threaded code but can lead to race conditions in some multi-threaded scenarios You can declare lists in various ways and they are documented below Lists are efficient when the programs which use them are only … Read more

Scala Collections – Introduction

Introduction I started looking at Scala collections after realising a need to polish some rough edges in the quest to become a better scala programmer. The scala collections series of blogs are my aim to become better at scala. It has been written from a beginner’s perspective but I hope I would be equally helpful to seasoned programmers. Scala collections are a vast a topic and there is something for everyone if you know where to start. There are few commonly used collections that are described here and is a good place to start. There are many collections which are not covered and hopefully this blog will give you an idea as to where go and look and how to implement them. Let’ start – Scala collections have a very well defined & documented hierarchy available here. However, I have taken the liberty to insert a few images from the … Read more

Apache Airflow – First DAG

Now that we have a working Airflow it is time to look at DAGs in detail. In the previous post, we saw how to execute DAGs from the UI. In this post, we will talk more about DAGs.DAGs are the core concept of airflow. Simple DAG Simply speaking a DAG is a python script. Here is the code of a hello world DAG. It executes a command to print “helllooooo world”. It may not do much but it provides a lot of information about how to write an airflow DAG. Understanding an Airflow DAG Remember this code is stored in the $DAGS_FOLDER. Please refer to the previous blog which has the details on the location. An important thing to note and I quote from the airflow website One thing to wrap your head around (it may not be very intuitive for everyone at first) is that this Airflow Python script … Read more

Apache Airflow – Getting Started

I recently finished a project where Apache Airflow(just airflow for short) was being touted as the next generating Workflow Mangement System and the whole place was just going gaga over. Well, that got me thinking how I could get to understand and learn it. Hence the blog post. Here are some things you may want to know before getting your hands dirty into Apache Airflow What is Airflow?The definition of Apache Airflow goes like this Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. So Airflow executes a series of unrelated tasks which when executed together accomplish a business outcome. For those folks who are working on the likes of Informatica – airflow is similar to Workflow Designer or those working in Oracle Data … Read more