Skip to main content

Section 6-10

Section 6: Amazon S3

6.1 Amazon S3 - Introduction

1. Amazon S3 - Buckets

  • Amazon S3 allows people to store objects (files) in “buckets” (directories). Amazon S3 允许人们将对象(文件)存储在“桶”(目录)中。
  • Buckets must have a globally unique name (across all regions all accounts). 桶必须有一个全球唯一的名字(跨所有地区所有账户)。
  • Buckets are defined at the region level
  • S3 looks like a global service but buckets are created in a region
  • Naming convention
    • No uppercase, No underscore
    • 3-63 characters long
    • Not an IP
    • Must start with lowercase letter or number
    • Must NOT start with the prefix xn--
    • Must NOT end with the suffix -s3alias

2. Amazon S3 - Objects

  • Objects (files) have a Key
  • The key is the FULL path:
    • s3://my-bucket/my_file.txt
    • s3://my-bucket/my_folder1/another_folder/my_file.txt
  • The key is composed of prefix + object name
    • s3://my-bucket/my_folder1/another_folder/my_file.txt
  • There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise)
  • Just keys with very long names that contain slashes (“/”)
  • Object values are the content of the body:
    • Max. Object Size is 5TB (5000GB)
    • If uploading more than 5GB, must use “multi-part upload”
  • Metadata (list of text key / value pairs – system or user metadata)
  • Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle
  • Version ID (if versioning is enabled)

6.2 Amazon S3 – Security

  • User-Based
    • IAM Policies – which API calls should be allowed for a specific user from IAM
  • Resource-Based
    • Bucket Policies – bucket wide rules from the S3 console - allows cross account. 存储桶策略——来自 S3 控制台的存储桶范围规则——允许跨账户。
    • Object Access Control List (ACL) – finer grain (can be disabled)
    • Bucket Access Control List (ACL) – less common (can be disabled)
  • Note: an IAM principal can access an S3 object if
    • The user IAM permissions ALLOW it OR the resource policy ALLOWS it
    • AND there’s no explicit DENY
  • Encryption: encrypt objects in Amazon S3 using encryption keys

6.3 Amazon S3 – Static Website Hosting 静态网站托管

  • S3 can host static websites and have them accessible on the Internet. S3 可以托管静态网站并在 Internet 上访问它们。
  • The website URL will be (depending on the region)
    • http://bucket-name.s3-website-aws-region.amazonaws.com OR
    • http://bucket-name.s3-website.aws-region.amazonaws.com
  • If you get a 403 Forbidden error, make sure the bucket policy allows public reads! 如果您收到 403 Forbidden 错误,请确保存储桶策略允许公开读取!

6.4 Amazon S3 - Versioning

  • You can version your files in Amazon S3
  • It is enabled at the bucket level
  • Same key overwrite will change the “version”: 1, 2, 3….
  • It is best practice to version your buckets
    • Protect against unintended deletes (ability to restore a version). 防止意外删除(恢复版本的能力)。
    • Easy roll back to previous version
  • Notes:
    • Any file that is not versioned prior to enabling versioning will have version “null”
    • Suspending versioning does not delete the previous versions

6.5 Amazon S3 – Replication (CRR & SRR)

  • Must enable Versioning in source and destination buckets
  • Cross-Region Replication (CRR) 跨区域复制
  • Same-Region Replication (SRR) 同区复制
  • Buckets can be in different AWS accounts
  • Copying is asynchronous
  • Must give proper IAM permissions to S3
  • Use cases:
    • CRR – compliance, lower latency access, replication across accounts
    • SRR – log aggregation, live replication between production and test accounts

6.6 S3 Storage Classes

Lifecycle Rules can be used to define when S3 objects should be transitioned to another storage class or when objects should be deleted after some time. 生命周期规则可用于定义何时应将 S3 对象转换为另一个存储类,或何时应在一段时间后删除对象。

1. S3 Durability and Availability

  • Durability:
    • High durability (99.999999999%, 11 9’s) of objects across multiple AZ. 跨多个 AZ 的对象的高持久性(99.999999999%,11 个 9)。
    • If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years
    • Same for all storage classes
  • Availability:
    • Measures how readily available a service is
    • Varies depending on storage class
    • Example: S3 standard has 99.99% availability = not available 53 minutes a year

2. S3 Standard – General Purpose

  • 99.99% Availability
  • Used for frequently accessed data
  • Low latency and high throughput
  • Sustain 2 concurrent facility failures
  • Use Cases: Big Data analytics, mobile & gaming applications, content distribution…

3. S3 Storage Classes – Infrequent Access

  • For data that is less frequently accessed, but requires rapid access when needed. 适用于较少访问的数据,但在需要时需要快速访问。
  • Lower cost than S3 Standard. 低于 S3 标准的成本。
  • Amazon S3 Standard-Infrequent Access (S3 Standard-IA)
    • 99.9% Availability
    • Use cases: Disaster Recovery, backups
    • Amazon S3 Standard-Infrequent Access allow you to store infrequently accessed data, with rapid access when needed, has a high durability, and is stored in several Availability Zones to avoid data loss in case of a disaster. It can be used to store data for disaster recovery, backups, etc. Amazon S3 Standard-Infrequent Access 允许您存储不常访问的数据,需要时快速访问,具有高持久性,并存储在多个可用区中,以避免在发生灾难时丢失数据。它可用于存储用于灾难恢复、备份等的数据。
  • Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
    • High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
    • 99.5% Availability
    • Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate

4. Amazon S3 Glacier Storage Classes

  • Low-cost object storage meant for archiving / backup. 低成本对象存储,用于存档/备份。
  • Pricing: price for storage + object retrieval cost. 定价:存储价格 + 对象检索成本。
  • Amazon S3 Glacier Instant Retrieval 即时检索
    • Millisecond retrieval, great for data accessed once a quarter
    • Minimum storage duration of 90 days
  • Amazon S3 Glacier Flexible Retrieval 灵活检索 (formerly Amazon S3 Glacier):
    • Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free
    • Minimum storage duration of 90 days
  • Amazon S3 Glacier Deep Archive – for long term storage:
    • Standard (12 hours), Bulk (48 hours)
    • Minimum storage duration of 180 days
    • the most cost-effective option if you want to archive data and do not have a retrieval time requirement. You can retrieve data in 12 or 48 hours. 如果您想存档数据并且没有检索时间要求,则这是最具成本效益的选择。 您可以在 12 或 48 小时内检索数据。

5. S3 Intelligent-Tiering 智能分层

  • Small monthly monitoring and auto-tiering fee. 小额每月监控和自动分层费用。
  • Moves objects automatically between Access Tiers based on usage. 根据使用情况自动将对象移动到访问层。
  • There are no retrieval charges in S3 Intelligent-Tiering. S3 智能分层中没有检索费用。
  • Frequent Access tier (automatic): default tier
  • Infrequent Access tier (automatic): objects not accessed for 30 days
  • Archive Instant Access tier (automatic): objects not accessed for 90 days
  • Archive Access tier (optional): configurable from 90 days to 700+ days
  • Deep Archive Access tier (optional): config. from 180 days to 700+ days

6.7 AWS Snow Family

  • Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS. 高度安全的便携式设备,可在边缘收集和处理数据,并将数据迁入和迁出 AWS。

6.8 Amazon S3 – Summary

  • Buckets vs Objects: global unique name, tied to a region
  • S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption
  • S3 Websites: host a static website on Amazon S3
  • S3 Versioning: multiple versions for files, prevent accidental deletes
  • S3 Replication: same-region or cross-region, must enable versioning
  • S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier (Instant, Flexible, Deep)
  • Snow Family: import data onto S3 through a physical device, edge computing
  • OpsHub: desktop application to manage Snow Family devices
  • Storage Gateway: hybrid solution to extend on-premises storage to S3

Section 7: Databases & Analytics

  • Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits
  • Sometimes, you want to store data in a database…
  • You can structure the data
  • You build indexes to efficiently query / search through the data
  • You define relationships between your datasets
  • Databases are optimized for a purpose and come with different features, shapes and constraints

7.1 NoSQL Databases

  • NoSQL = non-SQL = non relational databases
  • NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications.
  • Benefits:
    • Flexibility: easy to evolve data model
    • Scalability: designed to scale-out by using distributed clusters
    • High-performance: optimized for a specific data model
    • Highly functional: types optimized for the data model
  • Examples: Key-value, document, graph, in-memory, search databases

7.2 AWS RDS

  • RDS stands for Relational Database Service
  • It’s a managed DB service for DB use SQL as a query language.
  • It allows you to create databases in the cloud that are managed by AWS
    • Postgres
    • MySQL
    • MariaDB
    • Oracle
    • Microsoft SQL Server
    • Aurora (AWS Proprietary database)

1. Advantage over using RDS versus deploying DB on EC2

  • RDS is a managed service:
    • Automated provisioning, OS patching
    • Continuous backups and restore to specific timestamp (Point in Time Restore)!
    • Monitoring dashboards
    • Read replicas for improved read performance
    • Multi AZ setup for DR (Disaster Recovery)
    • Maintenance windows for upgrades
    • Scaling capability (vertical and horizontal)
    • Storage backed by EBS (gp2 or io1)
  • BUT you can’t SSH into your instances

2. RDS Deployments: Read Replicas, Multi-AZ 只读副本,多可用区

  • RDS Multi-AZ deployments’ main purpose is high availability, and RDS Read replicas’ main purpose is scalability. RDS 多可用区部署的主要目的是高可用性,而 RDS 只读副本的主要目的是可扩展性。

3. RDS Deployments: Multi-Region 多区域

  • Multi-Region deployments’ main purpose is disaster recovery and local performance. 多区域部署的主要目的是灾难恢复和本地性能。

7.3 Amazon Aurora

  • Aurora is a proprietary technology from AWS (not open sourced). Aurora 是 AWS 的专有技术(非开源)。
  • PostgreSQL and MySQL are both supported as Aurora DB
  • Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
  • Aurora storage automatically grows in increments of 10GB, up to 64 TB.
  • Aurora costs more than RDS (20% more) – but is more efficient
  • Not in the free tier

7.4 Amazon ElastiCache

  • The same way RDS is to get managed Relational Databases…
  • ElastiCache is to get managed Redis or Memcached. ElastiCache 是获取托管 Redis 或 Memcached 的方式。
  • Caches are in-memory databases with high performance, low latency. 缓存是具有高性能,低延迟的内存数据库。
  • Helps reduce load off databases for read intensive workloads
  • AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups

7.5 Amazon DynamoDB

  • Fully Managed Highly available with replication across 3 AZ. 完全托管,高可用性,跨 3 个可用区复制。
  • NoSQL database - not a relational database
  • Scales to massive workloads, distributed “serverless” database. 扩展到大型工作负载,分布式“无服务器”数据库。
  • Millions of requests per seconds, trillions of row, 100s of TB of storage. 每秒数百万次请求,万亿行,100 TB 的存储空间。
  • Fast and consistent in performance
  • Single-digit millisecond latency – low latency retrieval. 单位毫秒的延迟 - 低延迟检索。
  • Integrated with IAM for security, authorization and administration
  • Low cost and auto scaling capabilities
  • Standard & Infrequent Access (IA) Table Class

1. DynamoDB – type of data

  • DynamoDB is a key/value database

2. DynamoDB Accelerator - DAX

  • Fully Managed in-memory cache for DynamoDB. 用于 DynamoDB 的完全托管内存缓存。
  • 10x performance improvement– singledigit millisecond latency to microseconds latency – when accessing your DynamoDB tables
  • Secure, highly scalable & highly available
  • Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases. DAX 仅用于和与 DynamoDB 集成,而 ElastiCache 可用于其他数据库。

3. DynamoDB – Global Tables

  • Make a DynamoDB table accessible with low latency in multiple-regions. 使 DynamoDB 表在多个区域中具有低延迟访问。
  • Active-Active replication (read/write to any AWS Region).

7.6 Amazon Redshift

  • Redshift is based on PostgreSQL, but it’s not used for OLTP. Redshift 基于 PostgreSQL,但不用于 OLTP。
  • It’s OLAP – online analytical processing (analytics and data warehousing). 它是 OLAP - 在线分析处理(分析和数据仓库)。
  • Load data once every hour, not every second
  • 10x better performance than other data warehouses, scale to PBs of data
  • Columnar storage of data (instead of row based). 列存储数据(而不是基于行的)。
  • Massively Parallel Query Execution (MPP), highly available. 大规模并行查询执行(MPP),高可用性。
  • Pay as you go based on the instances provisioned
  • Has a SQL interface for performing the queries. 有 SQL 接口用于执行查询。
  • BI tools such as AWS Quicksight or Tableau integrate with it

7.7 Amazon EMR Elastic MapReduce

  • EMR stands for “Elastic MapReduce”. EMR 代表“弹性 MapReduce”。
  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data. EMR 帮助创建 Hadoop 集群(大数据)来分析和处理大量数据。
  • The clusters can be made of hundreds of EC2 instances
  • Also supports Apache Spark, HBase, Presto, Flink…
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, machine learning, web indexing, big data…

7.8 Amazon Athena

  • Serverless query service to analyze data stored in Amazon S3. 用于分析存储在 Amazon S3 中的数据的无服务器查询服务。
  • Uses standard SQL language to query the files
  • Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
  • Pricing: $5.00 per TB of data scanned
  • Use compressed or columnar data for cost-savings (less scan)
  • Use cases: Business intelligence / analytics / reporting, analyze &query VPC Flow Logs, ELB Logs, CloudTrail trails, etc
  • Exam Tip: analyze data in S3 using serverless SQL, use Athena

7.9 Amazon QuickSight

  • Serverless machine learning-powered business intelligence service to create interactive dashboards. 用于创建交互式仪表板的无服务器机器学习驱动的商业智能服务。
  • Fast, automatically scalable, embeddable, with per-session pricing
  • Use cases:
    • Business analytics
    • Building visualizations
    • Perform ad-hoc analysis
    • Get business insights using data
  • Integrated with RDS, Aurora, Athena, Redshift, S3…

7.10 Amazon DocumentDB

  • Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
  • DocumentDB is the same for MongoDB (which is a NoSQL database). DocumentDB 是 MongoDB 的同一实现(MongoDB 是 NoSQL 数据库)。
  • MongoDB is used to store, query, and index JSON data
  • Similar “deployment concepts” as Aurora
  • Fully Managed, highly available with replication across 3 AZ
  • DocumentDB storage automatically grows in increments of 10GB, up to 64 TB.
  • Automatically scales to workloads with millions of requests per seconds

7.11 Amazon Neptune

  • Fully managed graph database. 完全托管的图数据库。
  • A popular graph dataset would be a social network
    • Users have friends
    • Posts have comments
    • Comments have likes from users
    • Users share and like posts…
  • Highly available across 3 AZ, with up to 15 read replicas
  • Build and run applications working with highly connected datasets – optimized for these complex and hard queries. 构建和运行使用高度连接的数据集的应用程序 - 优化这些复杂和困难的查询。
  • Can store up to billions of relations and query the graph with milliseconds latency. 可以存储数十亿个关系,并以毫秒延迟查询图形。
  • Highly available with replications across multiple AZs
  • Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking. 适用于知识图(维基百科),欺诈检测,推荐引擎,社交网络。

7.12 Amazon Quantum Ledger Database(QLDB)

  • QLDB stands for ”Quantum Ledger Database”. QLDB 代表“量子分类帐数据库”。
  • A ledger is a book recording financial transactions. 分类帐是一本记录财务交易的书。
  • Fully Managed, Serverless, High available, Replication across 3 AZ
  • Used to review history of all the changes made to your application data over time. 用于查看随时间推移对应用程序数据所做的所有更改的历史记录。
  • Immutable system: no entry can be removed or modified, cryptographically verifiable. 不可变系统:无法删除或修改任何条目,可加密验证。
  • 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
  • Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules. 与 Amazon Managed Blockchain 的区别:没有去中心化组件,符合金融监管规则。

7.13 Amazon Managed Blockchain

  • Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. 区块链使得可以构建应用程序,多个方可以执行交易,而无需信任的中央机构。
  • Amazon Managed Blockchain is a managed service to:
    • Join public blockchain networks. 加入公共区块链网络。
    • Or create your own scalable private network. 或创建自己的可扩展的私有网络。
  • Compatible with the frameworks Hyperledger Fabric & Ethereum. 与 Hyperledger Fabric 和以太坊框架兼容。

7.14 Amazon Glue

  • Managed extract, transform, and load (ETL) service. 托管的提取,转换和加载(ETL)服务。
  • Useful to prepare and transform data for analytics
  • Fully serverless service
  • Glue Data Catalog: catalog of datasets
    • The AWS Glue Data Catalog is a central repository to store structural and operational metadata for all your data assets. For a given data set, you can store its table definition, physical location, add business relevant attributes, as well as track how this data has changed over time. AWS Glue Data Catalog 是存储所有数据资产的结构和操作元数据的中央存储库。 对于给定的数据集,您可以存储其表定义,物理位置,添加业务相关属性,以及跟踪此数据随时间而变化的方式。
    • can be used by Athena, Redshift, EMR

7.15 DMS – Database Migration Service 数据库迁移服务

  • Quickly and securely migrate databases to AWS, resilient, self healing. 快速安全地将数据库迁移到 AWS,弹性,自我愈合。
  • The source database remains available during the migration. 迁移期间源数据库仍然可用。
  • Supports: • Homogeneous migrations: ex Oracle to Oracle. 同质迁移:例如 Oracle 到 Oracle。
    • Heterogeneous migrations: ex Microsoft SQL Server to Aurora. 异质迁移:例如 Microsoft SQL Server 到 Aurora。

7.16 Databases & Analytics Summary in AWS

  • Relational Databases - OLTP: RDS & Aurora (SQL)
  • Differences between Multi-AZ, Read Replicas, Multi-Region. 多可用区,读取副本,多区域之间的区别。
  • In-memory Database: ElastiCache
  • Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
  • Warehouse - OLAP: Redshift (SQL)
  • Hadoop Cluster: EMR
  • Athena: query data on Amazon S3 (serverless & SQL)
  • QuickSight: dashboards on your data (serverless)
  • DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
  • Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable). 金融交易分类帐(不可变的日志,可加密验证)。
  • Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains. 管理 Hyperledger Fabric 和以太坊区块链。
  • Glue: Managed ETL (Extract Transform Load) and Data Catalog service. 托管 ETL(提取转换负载)和数据目录服务。
  • Database Migration: DMS
  • Neptune: graph database

Section 8: Other Compute Services: ECS, Lambda, Batch, Lightsail

8.1 Amazon Elastic Container Service (ECS) 弹性容器服务

  • ECS = Elastic Container Service
  • Launch Docker containers on AWS
  • You must provision & maintain the infrastructure (the EC2 instances). 你必须配置和维护基础架构(EC2 实例)。
  • AWS takes care of starting / stopping containers
  • Has integrations with the Application Load Balancer

8.2 Fargate

  • Launch Docker containers on AWS
  • You do not provision the infrastructure (no EC2 instances to manage)– simpler!. 你不需要配置基础架构(不需要管理 EC2 实例)——更简单!
  • Serverless offering
  • AWS just runs containers for you based on the CPU / RAM you need

8.3 ECR – Elastic Container Registry 弹性容器注册表

  • Elastic Container Registry
  • Private Docker Registry on AWS
  • This is where you store your Docker images so they can be run by ECS or Fargate. 这是你存储 Docker 镜像的地方,以便 ECS 或 Fargate 运行它们。

8.4 What’s serverless?

  • Serverless is a new paradigm in which the developers don’t have to manage servers anymore… 服务器即服务是一种新的范例,开发人员不再需要管理服务器。
  • They just deploy code
  • They just deploy… functions !
  • Initially... Serverless == FaaS (Function as a Service)
  • Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.”
  • Serverless does not mean there are no servers… it means you just don’t manage / provision / see them. 服务器即服务并不意味着没有服务器……这意味着你只是不管理/配置/看到它们。

8.5 Amazon Lambda

1. Benefits of AWS Lambda

  • Easy Pricing:
    • Pay per request and compute time
    • Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time
  • Integrated with the whole AWS suite of services
  • Event-Driven: functions get invoked by AWS when needed
  • Integrated with many programming languages
  • Easy monitoring through AWS CloudWatch
  • Easy to get more resources per functions (up to 10GB of RAM!)
  • Increasing RAM will also improve CPU and network!

2. AWS Lambda Pricing

  • Pay per calls:
    • First 1,000,000 requests are free
    • $0.20 per 1 million requests thereafter ($0.0000002 per request)
  • Pay per duration: (in increment of 1 ms)
    • 400,000 GB-seconds of compute time per month for FREE
    • == 400,000 seconds if function is 1GB RAM
    • == 3,200,000 seconds if function is 128 MB RAM
    • After that $1.00 for 600,000 GB-seconds
  • It is usually very cheap to run AWS Lambda so it’s very popular

8.6 AWS API Gateway

  • Example: building a serverless API
  • Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs. 为开发人员提供的完全托管服务,可轻松创建,发布,维护,监视和保护 API。
  • Serverless and scalable
  • Supports RESTful APIs and WebSocket APIs
  • Support for security, user authentication, API throttling, API keys, monitoring...

8.7 AWS Batch

  • Fully managed batch processing at any scale. 任何规模的批处理处理。
  • Efficiently run 100,000s of computing batch jobs on AWS
  • A “batch” job is a job with a start and an end (opposed to continuous)
  • Batch will dynamically launch EC2 instances or Spot Instances
  • AWS Batch provisions the right amount of compute / memory
  • You submit or schedule batch jobs and AWS Batch does the rest!
  • Batch jobs are defined as Docker images and run on ECS. 批处理作业被定义为 Docker 镜像,并在 ECS 上运行。
  • Helpful for cost optimizations and focusing less on the infrastructure

Batch vs Lambda

  • Lambda:
    • Time limit
    • Limited runtimes
    • Limited temporary disk space
    • Serverless
  • Batch:
    • No time limit
    • Any runtime as long as it’s packaged as a Docker image
    • Rely on EBS / instance store for disk space
    • Relies on EC2 (can be managed by AWS)

8.8 AWS Lightsail – Simple Virtual Private Servers 简单的虚拟专用服务器

  • Virtual servers, storage, databases, and networking.
  • Low & predictable pricing
  • Simpler alternative to using EC2, RDS, ELB, EBS, Route 53…
  • Great for people with little cloud experience. 适合没有云经验的人。
  • Can setup notifications and monitoring of your Lightsail resources
  • Use cases:
    • Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…)
    • Websites (templates for WordPress, Magento, Plesk, Joomla)
    • Dev / Test environment
  • Has high availability but no auto-scaling, limited AWS integrations

8.9 Other Compute - Summary

  • Docker: container technology to run applications
  • ECS: run Docker containers on EC2 instances. 在 EC2 实例上运行 Docker 容器。
  • Fargate:
    • Run Docker containers without provisioning the infrastructure. 运行 Docker 容器而无需配置基础架构。
    • Serverless offering (no EC2 instances)
  • ECR: Private Docker Images Repository. 私有 Docker 镜像仓库。
  • Batch: run batch jobs on AWS across managed EC2 instances. 在 AWS 上运行批处理作业,跨管理的 EC2 实例。
  • Lightsail: predictable & low pricing for simple application & DB stacks. 为简单的应用程序和数据库堆栈提供可预测和低价格。

8.10 Lambda Summary

  • Lambda is Serverless, Function as a Service, seamless scaling, reactive.
  • Lambda Billing:
    • By the time run x by the RAM provisioned
    • By the number of invocations
  • Language Support: many programming languages except (arbitrary) Docker
  • Invocation time: up to 15 minutes
  • Use cases:
    • Create Thumbnails for images uploaded onto S3. 为上传到 S3 的图像创建缩略图。
    • Run a Serverless cron job. 运行无服务器的 cron 作业。
  • API Gateway: expose Lambda functions as HTTP API. 将 Lambda 函数作为 HTTP API 公开。

Section 9: Deploying and Managing Infrastructure at Scale 大规模部署和管理基础设施

9.1 What is CloudFormation

  • CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported). CloudFormation 是描述 AWS 基础架构的声明性方法,适用于任何资源(大多数资源都受支持)。
  • For example, within a CloudFormation template, you say:
    • I want a security group
    • I want two EC2 instances using this security group
    • I want an S3 bucket
    • I want a load balancer (ELB) in front of these machines
  • Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify. 然后 CloudFormation 会按照您指定的顺序以及确切的配置为您创建这些资源。

Benefits of AWS CloudFormation

  • Infrastructure as code
    • No resources are manually created, which is excellent for control. 没有手动创建资源,这对于控制非常好。
    • Changes to the infrastructure are reviewed through code
  • Cost
    • Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you. 堆栈中的每个资源都被标记为一个标识符,以便您可以轻松查看堆栈的成本。
    • You can estimate the costs of your resources using the CloudFormation template. 您可以使用 CloudFormation 模板估算资源的成本。
    • Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely. 在 Dev 中,您可以在 5 点自动删除模板,并在 8 点重新创建,这是安全的。
  • Productivity
    • Ability to destroy and re-create an infrastructure on the cloud on the fly
    • Automated generation of Diagram for your templates!
    • Declarative programming (no need to figure out ordering and orchestration). 声明式编程(无需确定顺序和编排)。
  • Don’t re-invent the wheel
    • Leverage existing templates on the web!
    • Leverage the documentation
  • Supports (almost) all AWS resources:
    • Everything we’ll see in this course is supported
    • You can use “custom resources” for resources that are not supported

9.2 AWS Cloud Development Kit (CDK)

  • Define your cloud infrastructure using a familiar language:
    • JavaScript/TypeScript, Python, Java, and .NET
  • The code is “compiled” into a CloudFormation template (JSON/YAML)
  • You can therefore deploy infrastructure and application runtime code together. 因此,您可以将基础设施和应用程序运行时代码一起部署。
    • Great for Lambda functions
    • Great for Docker containers in ECS / EKS

9.3 AWS Elastic Beanstalk

  • Elastic Beanstalk is a developer centric view of deploying an application on AWS. Elastic Beanstalk 是开发人员为在 AWS 上部署应用程序而设计的视图。

  • It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, etc…

  • But it’s all in one view that’s easy to make sense of!

  • We still have full control over the configuration

  • Beanstalk = Platform as a Service (PaaS)

  • Beanstalk is free but you pay for the underlying instances

  • Managed service • Instance configuration / OS is handled by Beanstalk • Deployment strategy is configurable but performed by Elastic Beanstalk • Capacity provisioning • Load balancing & auto-scaling • Application health-monitoring & responsiveness

  • Just the application code is the responsibility of the developer. 只有应用程序代码是开发人员的责任。

  • Three architecture models:

    • Single Instance deployment: good for dev
    • LB + ASG: great for production or pre-production web applications
    • ASG only: great for non-web apps in production (workers, etc..)

9.4 AWS CodeDeploy

  • We want to deploy our application automatically
  • Works with EC2 Instances
  • Works with On-Premises Servers
  • Hybrid service
  • Servers / Instances must be provisioned and configured ahead of time with the CodeDeploy Agent. 服务器/实例必须事先配置和配置 CodeDeploy 代理。

9.5 AWS CodeCommit

  • Before pushing the application code to servers, it needs to be stored somewhere. 在将应用程序代码推送到服务器之前,需要将其存储在某个地方。
  • Developers usually store code in a repository, using the Git technology. 开发人员通常使用 Git 技术将代码存储在存储库中。
  • A famous public offering is GitHub, AWS’ competing product is CodeCommit
  • CodeCommit: • Source-control service that hosts Git-based repositories • Makes it easy to collaborate with others on code • The code changes are automatically versioned
  • Benefits:
    • Fully managed
    • Scalable & highly available
    • Private, Secured, Integrated with AWS

9.6 AWS CodeBuild

  • Code building service in the cloud (name is obvious). 云中的代码构建服务(名称很明显)。
  • Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example). 编译源代码,运行测试,并生成准备部署的软件包(例如 CodeDeploy)。
  • Benefits:
    • Fully managed, serverless
    • Continuously scalable & highly available
    • Secure
    • Pay-as-you-go pricing – only pay for the build time

9.7 AWS CodePipeline

  • Orchestrate the different steps to have the code automatically pushed to production. 安排不同的步骤,以便将代码自动推送到生产环境。
    • Code => Build => Test => Provision => Deploy
    • Basis for CICD (Continuous Integration & Continuous Delivery)
  • Benefits: • Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins…
    • Fast delivery & rapid updates

9.8 AWS CodeArtifact

  • Software packages depend on each other to be built (also called code dependencies), and new ones are created. 软件包依赖于彼此以构建(也称为代码依赖项),并创建新的软件包。
  • Storing and retrieving these dependencies is called artifact management
  • Traditionally you need to setup your own artifact management system
  • CodeArtifact is a secure, scalable, and cost-effective artifact management for software development. CodeArtifact 是软件开发的安全,可扩展和高效的构件管理。
  • Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet
  • Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact. 然后开发人员和 CodeBuild 可以直接从 CodeArtifact 检索依赖项。

9.9 AWS Systems Manager (SSM)

  • Helps you manage your EC2 and On-Premises systems at scale
  • Another Hybrid AWS service
  • Get operational insights about the state of your infrastructure
  • Suite of 10+ products
  • Most important features are:
    • Patching automation for enhanced compliance. 补丁自动化以提高合规性。
    • Run commands across an entire fleet of servers. 在整个服务器群集上运行命令。
    • Store parameter configuration with the SSM Parameter Store
  • Works for both Windows and Linux OS

1. How Systems Manager works

  • We need to install the SSM agent onto the systems we control. 我们需要将 SSM 代理安装到我们控制的系统上。
  • Installed by default on Amazon Linux AMI & some Ubuntu AMI
  • If an instance can’t be controlled with SSM, it’s probably an issue with the SSM agent!
  • Thanks to the SSM agent, we can run commands, patch & configure our servers

2. Systems Manager – SSM Session Manager

  • Allows you to start a secure shell on your EC2 and on-premises servers. 允许您在 EC2 和本地服务器上启动安全 shell。
  • No SSH access, bastion hosts, or SSH keys needed
  • No port 22 needed (better security)
  • Supports Linux, macOS, and Windows
  • Send session log data to S3 or CloudWatch Logs

9.10 AWS OpsWorks

  • Chef & Puppet help you perform server configuration automatically, or repetitive actions. Chef 和 Puppet 帮助您自动执行服务器配置或重复操作。
  • They work great with EC2 & On-Premises VM
  • AWS OpsWorks = Managed Chef & Puppet
  • It’s an alternative to AWS SSM
  • Only provision standard AWS resources:
    • EC2 Instances, Databases, Load Balancers, EBS volumes…
  • In the exam: Chef or Puppet needed => AWS OpsWorks

9.11 Deployment - Summary

  • CloudFormation: (AWS only)
    • Infrastructure as Code, works with almost all of AWS resources. 基础设施即代码,几乎适用于所有 AWS 资源。
    • Repeat across Regions & Accounts
  • Beanstalk: (AWS only)
    • Platform as a Service (PaaS), limited to certain programming languages or Docker. 平台即服务(PaaS),仅限于某些编程语言或 Docker。
    • Deploy code consistently with a known architecture: ex, ALB + EC2 + RDS
    • Can be used to monitor and to check the health of an environment.
  • CodeDeploy (hybrid): deploy & upgrade any application onto servers
  • Systems Manager (hybrid): patch, configure and run commands at scale. 批量修补,配置和运行命令。
  • OpsWorks (hybrid): managed Chef and Puppet in AWS.

9.12 Developer Services - Summary

  • CodeCommit: Store code in private git repository (version controlled). 将代码存储在私有 git 仓库中(版本控制)。
  • CodeBuild: Build & test code in AWS
  • CodeDeploy: Deploy code onto servers. Automates code deployments to any instance, including Amazon EC2 instances and instances running on-premises. 自动将代码部署到任何实例,包括 Amazon EC2 实例和本地运行的实例。
  • CodePipeline: Orchestration of pipeline (from code to build to deploy). 流水线的编排(从代码到构建到部署)。
  • CodeArtifact: Store software packages / dependencies on AWS
  • CodeStar: Unified view for allowing developers to do CICD and code. CodeStar is used to quickly develop, build, and deploy applications on AWS. 允许开发人员执行 CICD 和代码的统一视图。
  • Cloud9: Cloud IDE (Integrated Development Environment) with collab. 云 IDE(集成开发环境)与 collab。
  • AWS CDK: Define your cloud infrastructure using a programming language. 使用编程语言定义您的云基础架构。

Section 10: Global Infrastructure

10.1 Why make a global application?

  • A global application is an application deployed in multiple geographies。 全球应用程序是在多个地理位置部署的应用程序。
  • On AWS: this could be Regions and / or Edge Locations
  • Decreased Latency
    • Latency is the time it takes for a network packet to reach a server
    • It takes time for a packet from Asia to reach the US
    • Deploy your applications closer to your users to decrease latency, better experience
  • Disaster Recovery (DR)
    • If an AWS region goes down (earthquake, storms, power shutdown, politics)… You can fail-over to another region and have your application still working. 如果 AWS 区域出现故障(地震,风暴,停电,政治)… 您可以将其切换到另一个区域,应用程序仍然可以正常工作。 • A DR plan is important to increase the availability of your application. DR 计划有助于提高应用程序的可用性。
  • Attack protection: distributed global infrastructure is harder to attack

10.2 Amazon Route 53

  • Route53 is a Managed DNS (Domain Name System). Route53 是托管 DNS(域名系统)。
  • DNS is a collection of rules and records which helps clients understand how to reach a server through URLs. DNS 是一组规则和记录的集合,它帮助客户端理解如何通过 URL 访问服务器。
  • In AWS, the most common records are:
    • www.google.com => 12.34.56.78 == A record (IPv4)
    • www.google.com => 2001:0db8:85a3:0000:0000:8a2e:0370:7334 == AAAA IPv6
    • search.google.com => www.google.com == CNAME: hostname to hostname
    • example.com => AWS resource == Alias (ex: ELB, CloudFront, S3, RDS, etc…)

10.3 Amazon CloudFront

  • Content Delivery Network (CDN)
  • Improves read performance, content is cached at the edge. 提高读取性能,内容在边缘缓存。
  • Improves users experience
  • 216 Point of Presence globally (edge locations)
  • DDoS protection (because worldwide), integration with Shield, AWS Web Application Firewall, AWS WAF web access control lists. DDoS 保护(因为全球化),与 Shield,AWS Web 应用程序防火墙的集成。

1. CloudFront – Origins

  • S3 bucket
    • For distributing files and caching them at the edge. 用于分发文件并在边缘缓存它们。
    • Enhanced security with CloudFront Origin Access Control (OAC)
    • OAC is replacing Origin Access Identity (OAI)
    • CloudFront can be used as an ingress (to upload files to S3)
  • Custom Origin (HTTP)
    • Application Load Balancer
    • EC2 instance
    • S3 website (must first enable the bucket as a static S3 website)
    • Any HTTP backend you want

2. CloudFront vs S3 Cross Region Replication

  • CloudFront:
    • Global Edge network
    • Files are cached for a TTL (maybe a day)
    • Great for static content that must be available everywhere. 适用于必须在任何地方都可用的静态内容。
  • S3 Cross Region Replication S3 跨区域复制:
    • Must be setup for each region you want replication to happen
    • Files are updated in near real-time
    • Read only
    • Great for dynamic content that needs to be available at low-latency in few regions. 适用于需要在少数地区以低延迟可用的动态内容。

10.4 AWS Global Accelerator

  • Improve global application availability and performance using the AWS global network. 使用 AWS 全球网络提高全球应用程序的可用性和性能。
  • Leverage the AWS internal network to optimize the route to your application (60% improvement)
  • 2 Anycast IP are created for your application and traffic is sent through Edge Locations. 为您的应用程序创建了 2 个 Anycast IP,并通过 Edge Locations 发送流量。
  • The Edge locations send the traffic to your application

AWS Global Accelerator vs CloudFront

  • They both use the AWS global network and its edge locations around the world. 它们都使用 AWS 全球网络及其全球各地的边缘位置。
  • Both services integrate with AWS Shield for DDoS protection. 两项服务都与 AWS Shield 集成,用于 DDoS 保护。
  • CloudFront – Content Delivery Network
    • Improves performance for your cacheable content (such as images and videos). 为可缓存的内容(如图像和视频)提高性能。
    • Content is served at the edge
  • Global Accelerator
    • No caching, proxying packets at the edge to applications running in one or more AWS Regions. 没有缓存,在边缘代理数据包到运行在一个或多个 AWS 区域的应用程序。
    • Improves performance for a wide range of applications over TCP or UDP. TCP 或 UDP 协议下,为广泛的应用程序提高性能。
    • Good for HTTP use cases that require static IP addresses
    • Good for HTTP use cases that required deterministic, fast regional failover. 适用于需要确定性,快速区域故障转移的 HTTP 用例。

10.5 AWS Outposts

  • Hybrid Cloud: businesses that keep an onpremises infrastructure alongside a cloud infrastructure. 混合云:与云基础架构并存的本地基础架构的企业。
  • Therefore, two ways of dealing with IT systems: • One for the AWS cloud (using the AWS console, CLI, and AWS APIs) • One for their on-premises infrastructure
  • AWS Outposts are “server racks” that offers the same AWS infrastructure, services, APIs & tools to build your own applications on-premises just as in the cloud. AWS Outposts 是“服务器机架”,它提供了与云中相同的 AWS 基础架构,服务,API 和工具,以便在本地构建自己的应用程序。
  • AWS will setup and manage “Outposts Racks” within your on-premises infrastructure and you can start leveraging AWS services on-premises. AWS 将在您的本地基础架构中设置和管理“ Outposts Racks”,并且您可以开始在本地利用 AWS 服务。
  • You are responsible for the Outposts Rack physical security

10.6 AWS WaveLength

  • WaveLength Zones are infrastructure deployments embedded within the telecommunications providers’ datacenters at the edge of the 5G networks. WaveLength 区域是嵌入在 5G 网络边缘的电信提供商数据中心中的基础设施部署。
  • Brings AWS services to the edge of the 5G networks
  • Example: EC2, EBS, VPC…
  • Ultra-low latency applications through 5G networks
  • Traffic doesn’t leave the Communication Service Provider’s (CSP) network
  • High-bandwidth and secure connection to the parent AWS Region
  • No additional charges or service agreements
  • Use cases: Smart Cities, ML-assisted diagnostics, Connected Vehicles, Interactive Live Video Streams, AR/VR, Real-time Gaming, …

10.7 AWS Local Zones

  • Places AWS compute, storage, database, and other selected AWS services closer to end users to run latency-sensitive applications. 将 AWS 计算,存储,数据库和其他选定的 AWS 服务放在更接近最终用户的位置,以运行对延迟敏感的应用程序。
  • Extend your VPC to more locations – “Extension of an AWS Region”. 将 VPC 扩展到更多位置 - “AWS 区域的扩展”。
  • Compatible with EC2, RDS, ECS, EBS, ElastiCache, Direct Connect …
  • Example:
    • AWS Region: N. Virginia (us-east-1)
    • AWS Local Zones: Boston, Chicago, Dallas, Houston, Miami, …

10.8 Global Applications Architecture

10.9 Global Applications in AWS - Summary

  • Global DNS: Route 53
    • Great to route users to the closest deployment with least latency. 用于将用户路由到最近的部署并减少延迟。
    • Great for disaster recovery strategies
  • Global Content Delivery Network (CDN): CloudFront • Replicate part of your application to AWS Edge Locations – decrease latency. 将应用程序的一部分复制到 AWS 边缘位置 - 减少延迟。
    • Cache common requests – improved user experience and decreased latency
  • S3 Transfer Acceleration
    • Accelerate global uploads & downloads into Amazon S3. 加速全球上传和下载到 Amazon S3。
  • AWS Global Accelerator
    • Improve global application availability and performance using the AWS global network. 使用 AWS 全球网络提高全球应用程序的可用性和性能。
  • AWS Outposts
    • Deploy Outposts Racks in your own Data Centers to extend AWS services. 在您自己的数据中心中部署 Outposts Racks 以扩展 AWS 服务。
  • AWS WaveLength
    • Brings AWS services to the edge of the 5G networks. 将 AWS 服务带到 5G 网络的边缘。
    • Ultra-low latency applications. 极低延迟应用程序。
  • AWS Local Zones
    • Bring AWS resources (compute, database, storage, …) closer to your users. 将 AWS 资源(计算,数据库,存储等)更接近您的用户。
    • Good for latency-sensitive applications