Start Where You Are

Here is the file for this lab:

Here is an overview of what we will be doing in this lab. Along the way, you will get some exposure to how the cloud providers have created a lot of custom services that do ‘single’ things to make adoption of cloud much easier.

image.png

Step 1: Download the start file and upload it to Cloud Storage bucket

Name your cloud storage bucket fnamelnamelabc - you should remember how to do this from Lab B.

Leave the access settings at default, you are only going to be accessing it from a Cloud Side processing tool, which has access via your logged in account by default.

After this, download the file carma_chronicles_labc.csv here: and upload it into the cloud bucket. You can open the file on your computer first to see what it looks like. We will now use services within the cloud environment to process this file.

carma_chronicles_labc.txt

Step 2: Launch the respective Cloud Side Processing Engines

GCP

You will be using 
Dataflow.
In the search bar, search for Dataflow and launch it.
If you are asked to enable it, do so.

Here are the steps that you will have to perform successfully to complete this lab. For the first step to occur, the cloud side application needs to gain access to the data stored in the storage bucket. This is an important step that can determine how much a company’s cloud bill is - at is an architectural decision whether to use a hosted/managed processing tool or whether to build your own because of your custom needs. The figure below describes what we will do as part of this step. Take note of the options that pertain to various steps, e.g. programming a recurring processing task - which may be important for certain use cases in the real world.

image.png

Step 3: Configure Processing Workflow

Now what is our actual task - we are going to count the number of times each unique word in our start file appears.

GCP