In this tutorial, we will build a video-processing application using cloud resources from AWS. The focus of the tutorial will be predominantly on deploying the cloud resources using AWS CloudFormation, while the video-processing application will serve as a simple example. Here are the specific AWS cloud resources you will create during this tutorial:
Your final architecture will look like this:
As you can see from the diagram, our application will be replicated in two Availability Zones (AZ) and will reside in a VPC for increased security. Each AZ consists of one public and two private subnets. Subnets allow for a granular traffic control in our application. For the first AZ, our databases will reside in the private subnet 3 and will only accept requests from the private subnet 1. Private subnet 1 will host the web servers hosted on EC2 instances, responsible for processing requests, logging them in, and returning results. The public subnet will host the Load Balancer that will equally distribute incoming user requests between the web servers to avoid overloading any individual web server.
For the video-processing itself, users will input a YouTube link, and the application will create a short video with all the clips where a specific keyword is mentioned. Here is an example of such a video on YouTube: https://www.youtube.com/watch?v=8oQ8zbyCJd0. The application will download the YT video and its transcript, parse the transcript using Regular Expressions (RegEx), find words similar to the keyword, extract the timestamps between which those words occur, and then glue together all video clips between the timestamps. You can look into more details of how it’s done in Part 6 where we create an AWS Lambda function and place the video-processing script lambda_handler.py
there, which you will find the the tutorial’s assets linked in the Prerequisites section. We will do the video-processing using AWS Lambda which would reduce the headache of infrastructure setup and management that comes from deploying EC2 servers or using containers.
For databases, we will have an Amazon RDS where we will store relational data such as user’s login info, links to the processed videos, and other metadata. For storing the videos, we will use S3 buckets as they are a perfect fit for object storage.
Overall, each component will be replicated multiple times and across two Availability Zones to ensure that the system is available at all times. Now that we have an overview of the architecture in our mind, let’s quickly take a look at the tool we’ll be using to define and built it - AWS CloudFormation.
We will be building a quite big and complex system, so imagine what a hustle it would be to do it manually with the AWS Console. We would need to manually set up the VPCs, subnets, EC2s, RDS, S3, and more. That’s a lot of manual work that we definitely don’t want to repeat more than once. Instead, we will be using AWS CloudFormation - an Infrastructure-as-Code (IaC) tool that allows us to automate the cloud resource provisioning. We will outline our resources in .yaml files, which CloudFormation will then execute to create the resources for us. The best thing is that if we need to change something, we can change a line of code in the .yaml file and easily redeploy the entire system. It might seem confusing at first where we are getting all the commands and parameters for our .yaml
files but you’ll see that all we are doing is replicating manual commands with code, and AWS CloudFormation documentation tells us how to do that. Next, let’s make sure we have all the prerequisites to be able to complete the tutorial.
Make sure to fulfill these prerequisites to be able to complete the tutorial:
Now, with some theory and prerequisites in place, let’s get to building our application.