Posts

ML Exam Prep: 8 - Model Fitting

 ML Exam Prep Model Fitting 1. Overfitting (Model is too complex) The Problem:  Memorizes training data (including its noise and random errors) instead of patterns. It scores 99% on training but terrible on new data. How to Prevent It: Increase Regularization : Penalizes large weights; L1 (Lasso) zeroes out weak features, L2 (Ridge) shrinks them. Increase Dropout : Randomly shuts off neurons during training to force generalized learning. Fewer Feature Combinations : Removes complex or noisy inputs to stop hyper-specific conclusions. Early Stopping : Halts training the moment validation loss begins to rise. Data Augmentation : Tweaks existing training samples (e.g., rotating images) to create new variety. 2. Underfitting (Model is too simple) The Problem:  Fails to get data patterns, performing poorly on both training and testing sets. How to Prevent It: Increase Model Complexity:  Add layers/neurons or switch to a stronger algorithm. Decrease Regularization:  ...

ML Exam: 7 - End-to-End Process

Image
ML Exam: 7 - End-to-End Process 1. Define Business Problem and Data Objectives   Pick core metric to optimize (e.g., churn rate, fraud detection). See if requires supervised, unsupervised, or reinforcement learning. Map out data availability, regulatory compliance boundaries, and project success metrics.   2. Data Ingestion and Collection Aggregate raw structured, semi-structured, or unstructured data into cloud storage. Use S3 as the centralized data lake landing zone. Import streaming data in real time using Kinesis . Extract relational database data using Glue or DMS .   3. Data Cleansing and Preparation Clean raw datasets by handling missing values, filtering duplicates, and removing outliers. Transform features using SageMaker Data Wrangler to visually profile data quality. Standardize, normalize, and tokenize data text or resize images for computer vision. Store fully processed, reusable data features in the SageMaker Feature Store .   4. Data Labeling and A...

ML Exam: 6 - Feature Engineering

Image
ML Exam: 6 Feature Engineering  Feature Engineering Tools  * Sagemaker Data Wrangler  * Sagemaker Canvas. These services offer over 300 built-in transformations. Feature Engineering - Basic Concepts Applying domain knowledge (your data  knowledge and model  knowledge) to create better features to train your model is the  ART OF ML!!.  Most critical part in a good ML implementation.  Talented/expert ML specialists are good at feature engineering. Curse of dimensionality is m ore features is not better!  Every feature is a new dimension. Much of feature engineering is selecting most relevant features → domain knowledge comes into play. Unsupervised dimensionality reduction can help (PCA, K-Means). Feature Engineering - Techniques Data Cleansing: Missing Value Imputation : Replaces NULL with the mean, median, or a custom placeholder. Outlier Detection:    Finding anomalies by standard deviation or Interquartile Range (IQR) formulas...