Top AWS Interview Questions - Solved

AWS or Amazon Web Service is the top cloud service provider with a lot of services and variety of functionalities and tasks which can be acheived with AWS services. This page will explore interview questions around AWS services related to ETL , data engineering and services like S3 interview questions, Glue interview questions, crawlers, Lambda etc. .

Q: What is the difference between delete and permanent delete in AWS S3 and how would you do it?

When you delete an S3 file or object you can restore the file but when you permanently delete it , you won't be able to restore the object.
S3 Versioning must be enables to achieve soft delete
Below are the steps to achieve permanent delete in S3
Step1 : Do Enable "Show Version" button.
Step2 : Select specific version of the file you want to delete. it will take you to another page.
Step2 : In case there are multiple versions, then top most version is the latest version.
Step3 : Type "permanently delete" in the given form area and delete. it will permanently delete it.
Below are the steps to achieve soft delete in S3
Step1 : Do not Enable "Show Version" button.
Step2 : Select the file you want to delete. it will take you to another page.
Step3 : Type "delete" in the given form area and delete. It will soft delete it and when you enable "show version" it will show you the file along with delete marker of the file from where you can restore the S3 file.

Q: How to restore deleted file in AWS S3 ?

When you soft delete a file, it creates a delete marker in S3. If you want to restore, select the "delete marker" and delete it. This way the file can be restored.

Q: What is the size of "delete marker" in AWS S3 ?

When you soft delete a file in S3, it creates a delete marker along with that with same name but type as "delete marker"and its size is zero byte.

Q: What are different encryptions available in AWS S3 ?

By Default, AWS S3 objects are encrypted with Amazon S3 managed Server Side Encryption (SSE) Keys which uses AES-256 encryption.
One can also use Key Management Service (KMS) server side encryption. This gives you more control over your keys and you can also analyze usage of your keys and by whom.

Q: What kind of Website can be hosted with AWS S3 ?

Static wesites which do not need server side processing can be directly hosted with AWS S3.

Q: Can we store a file with parquet format in AWS S3 ?

Yes, parquet files can be stored with AWS S3. S3 allows virtually any kind of file format to be stored.

Q: How to analyze data stored in AWS S3 ?

AWS has a serverless service called Athena which is a query engine that can be linked to AWS S3 with data catalog.
Because of Athena's serverless architecture, we don't need to care about scaling up or down as well
We can create a table within Athena using Athena DDL format which can be attached to particular S3 location for which you want to analyze data using SQL
ANSI SQL can be used within Athena editor to query and analyze data
This type of interview question will test your technical knowledge around architecture of AWS S3, Athena and data catalog.

Q: What is the default bucket limit in AWS S3 ?

Default bucket limit for S3 in an AWS account is 100 buckets. This can be upgraded with request.

Q: How to give access to a user to a particular AWS S3 bucket but not to other buckets ?

This can be done with either IAM policies or Bucket Policies. IAM policies are usually used for role based policies including users and groups to different AWS services which gets attached to user whereas bucket policies are for controlled accss to AWS S3 buckets and these areattched to S3 buckets. So, in this case user can be given IAM access to S3 buckets but in order to not able to access particular bucket, we can add bucket policy to that particular bucket to be able to accessed by admin only. This also explains difference between IAM policy and bucket policy in breif.

Q: What is the basic AWS Glue ETL Architecture?

Q: What are different types of load balancers available in AWS EC2 ?

There are three types of load balancers available maily : Application Load Balancer, network Load Balancer and Classinc Load balancer.
Application load balancer helps in making the routing decision and balances the routing load.
Network load balancer take cares of the network load like number of requests per second and helps transport layer routing decision.
Classic load balancer is used only where application were developed on classic ec2 network and balances load between different ec2 instances

Q: How Schema of source data is determined?

When Aws glue crawler connects to data source using classifiers then it determines the schema of the data and creates metadata in AWS Glue data catalog.

Q: What are different AWS glue studio components ?

Within Glue components, there are Data sources, Data targets and Transforms components to maniplate data.
Data sources can include : S3 , RDS, Kafka, Kinesis, JDBC
Data target can include : S3 , Glue Catalog (Redshift, RDS etc)
Transform can include : Join, filter, Applymapping, Custom transforwhere you can write SparkSQL etc etc

Q: How to check error for failed glue job ?

Within Glue Studio, there There is a monitoring tab.
Within monitoring tab, scroll to the "Runs" where you select the particular glue job.
Once you select the glue job, check the job details by clicking the option where it will take you to details where you can see reason for error.

Q: How to connect with data sources outside aws ?

Connectors inside glue studio. Few connectors are already created by AWS and present to be used for free. You can check the Marketplace where you can also purchase the required connector.

Q: What is default checkpoint location in AWS GLUE ?

AWS Glue manages checkpointing for us with keeping tracks of operation being performed within glue job and default directory is temp directory which we can see and change in "Job Details" tab. it is a best practice to update the path to a particular path belongs only to a particular job.

Q: How would you filter data using AWS glue visual code ?

This can be done in two ways:
We can use "Filter" node from transform of glue visual code generator menu.
OR
We can use "Custom Transform" node from transform of glue visual code generator menu where we can write our custom pyspark code or sparksql to filter the data.

Q: Which programming language can be used in AWS glue ?

Scala , python or pyspark can be used to code in AWS Glue.
When we use visual code in glue, its generated code can also be checked and modified using console editor provided by Glue Studio.
One can also create or edit in their own IDE before adding it to Studio .

Q: What are Elastic views in AWS glue ?

Aws Glue Elastic views helps to combine and replicate data across multiple datastores where there is no need to write any custom code.
Elastic views create a replica of source data in target datastore.
SQL can be used to query data which will output data in a tabular format.

Q: Where is data stored in AWS Glue Database ?

Data resides in AWS S3 buckets only. A folder inside S3 bucket can be used as a data base and within that folder different Tabe folder can be there.
That folder S3 location is assigned to AWS Glue Dsatabase configurations whicl will now act as a DB. It can be understood as a logical organization of tha data.

Q: How to configure AWS Glue Database ?

Below are the steps :
Step 1 : Go to AWS Glue Studio
Step 2 : Go to Databases whilc will be at left section of the studio.
Step 3 : Click on Add Database option and provide DB name and S3 location of the database and then click on create.
The above steps will create glue database.

Q: What are AWS Glue Tables and how to create them ?

AWS Glue tables are logical representation of the data or we can say metadata of the data.
The data actually resides in AWS S3 buckets. S3 folder can be configured as table and file inside that is the table data.
Within table folder, there can be partition folders also.
We can create table either manually or by creating a crawler.
When creating manually, we need to add data fields and type but with crawler it automatically detects the file type , data type and partition columns.
With crawler, you also need to provide an IAM role to the crawler so that it can access S3.

Q: What are different services of AWS are used for data engineering ?

This interview question is to understand candidate's understanding on AWS sevices.
Below table will explain in brief about common AWS services used for data engineering :
AWS Service Type Description
AWS S3 Data Storage AWS S3 or Simple Storage Service can be used to store data in almost any file format. This can be used as input as well as output.
AWS Kinesis Streaming Data Amazon Kinesis is used to analyze the streaming data within AWS.
AWS Glue ETL Glue is a serverless platform by amazon to provide on cloud ETL or Extract, Transform and Load service. Glue catalog can be created and crawlers can be defined to fetch data. pyspark code can be used also for transformation.
AWS Cloudwatch Logging Cloudwatch is the service where job logs are generated which can be analyzed for issue and failures.
AWS IAM Access Management AWS IAM or Identity Access Management is used to assign different kind of access to different roles which is essential in order to control data and job access.
AWS Athena Virtual Database AWS Athena is basically used to analyze data which is stored in S3. We can create tables in athena using athena DDL which can point to data location in S3.
AWS EMR Data Processing This service is used for processing data where large volume is expected.
AWS DynamoDB RDBMS This provides relational database alternative within AWS. Can be used for structured and semistructured data.