ML Exam Prep

Key Algorithms

Common Algorithms:

K-Means algorithm = UL. No event planning by 1) K = number of K leaders for people to cluster around, 2) each data point finds closest leader, and 3) the leader moves to the Means = math mean (center) of their group. finds hidden or unlabeled patterns.

KNN algorithm = SL. Classification (common) or rarely in Progression. classifies data point on how its features are similar to others (neighbors). Classification answer is 0 to 1.

Label encoding = City name ("Dallas", "Paris", "London") to single column of City of (0, 1, 2).

Latent Dirichlet Allocation: UL. NLP. Dirichlet is a lazy (so UL) bible reader that looks through text (so NLP), finding different topics, and finds the theme by associations between topics.

Linear regression algorithm = SL. predicts "best-fit" line of min diff between input and output. Disadv: does not work with non-normal data. Can be negative. Answer is continuous range.

Logistic regression algorithm = SL. Classification. estimates probability that input is in category by predicting binary outcome of using Logistic function and Log-odds. allows users to adjust the weights of different variables based on domain knowledge and expertise. Answer is 0 to 1.

One-Hot Encoding: "City Name" column into City_New_York, City_Paris, City_Tokyo columns with only single 1 row.

Recurrent Neural Network (RNN): for sequential data (such as time series, speech, or text).

eXtreme Gradient Boosting (XGBoost):SL. builds trees sequentially, with each new tree correcting previous errors. Good at regression, classifying, and ranking. Good on tabular data. Gradient boosted trees algorithm. Set Objective parameter to multi:softmax if doing product categorization.

Features/Highly Correlated Data:

Pearson Correlation Coefficient: Feature correlation is 0 = no (so independent), 1 (or -1) = strong. Use Naive Bayesian Model if independent, otherwise use full Bayesian network if dependent.

Principal Component Analysis (PCA): Drops dimensions, keeps data variance.

Recursive Feature Elimination (RFE): iteratively trains the model, ranks features by importance (e.g., based on coefficients in logistic regression), removes least important features, and repeats the process until target number.

Remove a portion of highly correlated features from data.

Outliers:

Random Cut Forest: outlier detection.

Recommendations:

Factorization Machines (FM): For recommendation systems and sparse data prediction (e.g., product recommendations).

Images:

Convolutional neural network (CNN): DL for images; uses neural network filters to grid pixels (3x3 block) for pattern recognition. Filters scale from local to abstract. Ideal for computer vision, image classification, OCR, and medical imaging.

Semantic segmentation: Precise object boundary detection such as people edges in images.

Single Shot MultiBox Detector (SSD): real-time object detection algorithm.

Time Series Forecasting Terms (Traditional Statistics)

ARIMA : Auto Regression (AR) + differencing (I) + moving average (MA). For simple, single-time series.

CNN-QR: very complex time series.

DeepAR+: very complex time series.

ETS = family of models based on Error, Trend, and Seasonal pieces using ExponenTial Smoothing. calcs by exponentially decreasing weights over time smoothing by doing weighted average over past observations.

Prophet: time-series plus holiday and seasons

Search This Blog

Ones and Zeros

ML Exam Prep: 10 - Key Algorithms

ML Exam Prep

Key Algorithms

Comments

Post a Comment

Popular posts from this blog

GHL Email Campaigns

Await

Free AI Tools