Posts

ML Exam - 5 Products

ML Exam 5 Products Data Analytics Data Analysis & Visualization Quick Sight = I nteractive dashboards and reports over data .       Athena = serverless SQL on S3 for ad-hoc queries and data lake analysis . cost-effective. Does SQL in parallel .      Redshift - Think Oracle was "Big Red" and shifting away from Oracle data warehouses . F ully managed. S tructured or semi-structured data. scalability and pay-as-you-go pricing model. SQL across data warehouses, data lakes, and operational DBs. Can run either with provisioning OR stateless unprovisioned.  Data Pipelines Kinesis Data Streams for real-time data from apps, streams + sensors. Auto provisioning and scaling in on-demand mode. Kinesis is for real time streaming event data and instant analytics/metrics over those streams. Data Firehose for near real-time data. Fully managed service. Auto provisioning and scaling. Gives data to storage and services. Data Processing Glue is serverless...

ML Exam - 4 Data

Image
ML Exam  4 Data Types of Structure of Data Properties of Data (4 Vs) 1. Volume   (Size) - GBs? PBs?… 2. Velocity  - High velocity → Real-Time or near-RT processing 3. Variety  - Structured? Mixed? Multiple sources? Multiple formats? 4. Veracity? Data Warehouses, Data Lakes, Data Lakehouses - Data Warehouse (DWH) (e.g. Amazon Redshift) - Centralized repository optimized for analysis  (read-heavy operations) where data from different sources is stored in a structured format - Data Lake (e.g. Amazon S3 can be used as data lake)  - Storage repository that holds vast amounts of raw data  in its native format (predefined structure is not necessary). Structured, semi-structured, & unstructured data - Often, organizations use a combination of both , ingesting raw data into a data lake and then processing and moving refined data into a data warehouse for analysis - Data Lakehouse (e.g. AWS Lake Formation with S3 & Redshift Spectrum) - Hybrid data ar...

ML Exam - 3 Sagemaker AI

Image
  ML Exam  3 Sagemaker AI SageMaker  AI  is the “heart” of the MLA-C01 certification The majority of exam questions will have to do with SageMaker, and knowing it inside and out will be essential to do well in the exam. It is important to understand and discern between SageMaker Processing, SageMaker Training, and SageMaker Hosting, which all cover different aspects of the end-to-end ML process. These notes first cover generic ML knowledge and concepts, and then their implementation in AWS (usually involving SageMaker and other AWS services). Some open-source Apache services like Hadoop or Spark are also covered, since they are also popular in ML environments and are well supported in AWS It is a good idea to review the high-level overview of SageMaker that was done in the foundational AIF-C01 certification. MLA-C01 builds on top of that knowledge. Intro to SageMaker AI AWS service that can handle the whole E2E process in ML E2E ML process = Data process...