Posts

ML Exam Prep - Products

ML Exam Prep  AWS Products  Analytics Amazon Athena Amazon Data Firehose Amazon EMR AWS Glue AWS Glue DataBrew AWS Glue Data Quality Amazon Kinesis AWS Lake Formation Amazon Managed Service for Apache Flink Amazon OpenSearch Service Amazon Quick Amazon Redshift Application Integration Amazon EventBridge Amazon Managed Workflows for Apache Airflow (Amazon MWAA) Amazon SNS Amazon SQS AWS Step Functions Cloud Financial Management AWS Billing and Cost Management AWS Budgets AWS Cost Explorer Compute AWS Batch Amazon EC2 AWS Lambda AWS Serverless Application Repository Containers Amazon ECR Amazon ECS Amazon EKS Database Amazon DocumentDB Amazon DynamoDB Amazon ElastiCache Amazon Neptune Amazon RDS Developer Tools AWS CDK AWS CodeArtifact AWS CodeBuild AWS CodeDeploy AWS CodePipeline AWS X-Ray Machine Learning Amazon Augmented AI (Amazon A2I) Amazon Bedrock Amazon CodeGuru Amazon Comprehend Amazon Comprehend Medical Amazon DevOps Guru Amazon Fraud Detector AWS HealthLake Amazon Ken...

ML Exam Prep - Data

Image
ML Exam Prep  Data Types of Structure of Data Properties of Data (4 Vs) 1. Volume   (Size) - GBs? PBs?… 2. Velocity  - High velocity → Real-Time or near-RT processing 3. Variety  - Structured? Mixed? Multiple sources? Multiple formats? 4. Veracity? Data Warehouses, Data Lakes, Data Lakehouses - Data Warehouse (DWH) (e.g. Amazon Redshift) - Centralized repository optimized for analysis  (read-heavy operations) where data from different sources is stored in a structured format - Data Lake (e.g. Amazon S3 can be used as data lake)  - Storage repository that holds vast amounts of raw data  in its native format (predefined structure is not necessary). Structured, semi-structured, & unstructured data - Often, organizations use a combination of both , ingesting raw data into a data lake and then processing and moving refined data into a data warehouse for analysis - Data Lakehouse (e.g. AWS Lake Formation with S3 & Redshift Spectrum) - Hybrid data...

ML Exam Prep - Sagemaker AI

Image
  ML Associate Exam Prep  Sagemaker AI SageMaker  AI  is the “heart” of the MLA-C01 certification The majority of exam questions will have to do with SageMaker, and knowing it inside and out will be essential to do well in the exam. It is important to understand and discern between SageMaker Processing, SageMaker Training, and SageMaker Hosting, which all cover different aspects of the end-to-end ML process. These notes first cover generic ML knowledge and concepts, and then their implementation in AWS (usually involving SageMaker and other AWS services). Some open-source Apache services like Hadoop or Spark are also covered, since they are also popular in ML environments and are well supported in AWS It is a good idea to review the high-level overview of SageMaker that was done in the foundational AIF-C01 certification. MLA-C01 builds on top of that knowledge. Intro to SageMaker AI AWS service that can handle the whole E2E process in ML E2E ML process =...

ML Exam - Transformers and LLMs

  ML Associate Exam Prep  Transformers and LLMs Basic Concepts - Tokens and Embeddings Tokens = numerical representations of words or parts of words A word can consist of 1+ tokens Punctuation signs (. “ ,) are also usually tokens Words/tokens can be loosely thought as the same , although strictly speaking they're obviously different Embeddings = mathematical representations (vectors) that encode the “meaning” of a token Evolution of the Transformer Architecture 1. RNNs and LSTMs    Recurrent Neural Networks (RNNs) are AI models designed for sequential data - like text or time series - by using internal memory to process inputs in order.     Long Short-Term Memory (LSTM) networks are a specialized, advanced type of RNN created to solve the "vanishing gradient" problem, allowing them to learn long-term dependencies that standard RNNs forget.    RNNs and LSTMs are obsolete with Transformers for many NLP tasks, though they remain rele...

ML Exam Prep

  ML Associate Exam Prep  1: Data Prep for ML (28%) Task 1.1: Ingest and store data Knowledge of: * Data formats and ingestion mechanisms (ex: validated and non-validated formats, Apache Parquet, JSON, CSV, Apache ORC, Apache Avro, RecordIO) * Use the core data sources (ex: S3, EFS, FSx for NetApp ONTAP) * Use streaming data sources to ingest data (ex: Kinesis, Apache Flink, Apache Kafka) * Storage options, including use cases and tradeoffs Skills in: * Extracting data from storage (ex: S3, EBS, EFS, RDS, DynamoDB) by using (ex: S3 Transfer Acceleration, EBS Provisioned IOPS) * Choosing appropriate data formats (ex: Parquet, JSON, CSV, ORC) based on data access patterns * Ingesting data into SageMaker Data Wrangler and SageMaker Feature Store * Merging data from multiple sources (ex: programming, Glue, or Apache Spark) * Troubleshooting and debugging data ingestion and storage issues that involve capacity and scalability * Making initial storage decisions based on cost, perfor...

Dapper with C#

Dapper with C#    Dapper is best used when you prioritize max performance and total control over SQL. While Entity Framework (EF) Core is a feature-rich Object-Relational Mapper (ORM) designed for productivity, Dapper is a "micro-ORM" that provides a thin, high-speed layer over ADO.NET. Key Reasons to Choose Dapper Over EF Core Superior Performance:  Dapper has minimal overhead because it does not perform change tracking, LINQ-to-SQL translation, or complex entity materialization. Benchmarks often show Dapper is significantly faster, especially for large datasets or high-frequency read operations. Full SQL Control:  You write raw SQL directly, giving you complete flexibility to use database-specific features like  Common Table Expressions (CTEs) , window functions, or specialized joins that might be difficult to express in LINQ. Reduced Memory Allocation:  Because it lacks the heavy state-management infrastructure of EF Core, Dapper typically consumes less ...

Ground Zero Agile Project

Ground Zero Agile Project   So you have chosen to do a project the Agile way and read the Agile Manifesto, what to do?  What is the plan?  So this article is a proposed "Ground Zero" project.   All Agile projects must be: 1) deployable to the target computer system to demo, 2) a manifest to make it deployable but flexible, 3) the developer team is ready to go with working (and proficient) with their programming language (if not, then we need to train them) and code generator or AI, 4) there is a given source control app governing CI/CD, 5) and that the user stories have been done for the first sprint.   All Agile projects for software vendors, need to have a common company app (for the company's eventual suite of products) that: 1) does admin piece (where first installer sets up the administrator user and some other users, 2) shows some license screen, 3) shows the common expected user interface layout (so team gets to practice with this.)    Suggestio...