Posts

ML Exam Prep: 8 - Model Fitting

 ML Exam Prep Model Fitting 1. Overfitting (Model is too complex) The Problem:  Memorizes the training data (including its noise and random errors) instead of patterns. It scores 99% on training but performs terribly on new data. How to Prevent It: Increase Regularization : Penalizes large weights; L1 (Lasso) zeroes out weak features, L2 (Ridge) shrinks them. Increase Dropout : Randomly shuts off neurons during training to force generalized learning. Fewer Feature Combinations : Removes complex or noisy inputs to stop hyper-specific conclusions. Early Stopping : Halts training the moment validation loss begins to rise. Data Augmentation : Tweaks existing training samples (e.g., rotating images) to create new variety. 2. Underfitting (Model is too simple) The Problem:  Fails capture the basic data patterns, performing poorly on both training and testing sets. How to Prevent It: Increase Model Complexity:  Add layers/neurons or switch to a stronger algorithm. Decreas...

ML Exam: 7 - End-to-End Process

Image
ML Exam: 7 - End-to-End Process 1. Define Business Problem and Data Objectives   Pick core metric to optimize (e.g., churn rate, fraud detection). See if requires supervised, unsupervised, or reinforcement learning. Map out data availability, regulatory compliance boundaries, and project success metrics.   2. Data Ingestion and Collection Aggregate raw structured, semi-structured, or unstructured data into cloud storage. Use S3 as the centralized data lake landing zone. Import streaming data in real time using Kinesis . Extract relational database data using Glue or DMS .   3. Data Cleansing and Preparation Clean raw datasets by handling missing values, filtering duplicates, and removing outliers. Transform features using SageMaker Data Wrangler to visually profile data quality. Standardize, normalize, and tokenize data text or resize images for computer vision. Store fully processed, reusable data features in the SageMaker Feature Store .   4. Data Labeling and A...

ML Exam: 6 - Feature Engineering

Image
ML Exam: 6 Feature Engineering  Feature Engineering - Basic Concepts Applying domain knowledge (your knowledge of the data – and the model you’re using) to create better features to train your model ART OF ML!! Most critical part in a good ML implementation Talented/expert ML specialists are good at feature engineering Curse of dimensionality More features is not better! Every feature is a new dimension Much of feature engineering is selecting most relevant features → domain knowledge comes into play Unsupervised dimensionality reduction techniques can help (PCA, K-Means) Feature Engineering - Techniques Numeric: Min-Max Scaling : Rescales features to a fixed range (typically 0 to 1) to prevent features with large magnitudes from dominating model training. Standardized Distribution/Standard Scaling (Z-score) : Centers data around a mean of 0 with a standard deviation of 1 for algorithms assuming normally distributed data. Puts wide number ranges into same mat...