AWS Certification Prep - General Terms

  

AWS Certification Prep

General AI


General Misc Abbreviations
APIs = Application Programming Interfaces
docs = documents
ex = example
GPU = Graphics Processing Unit
IDE = Integrated Development Environment
ISV = Independent Software Vendor
NIST = National Institute of Standards and Technology
OWASP = Open Web Application Security Project
PII = Personally Identifiable Info
SME = Subject matter expert.

General Terms
Bias = data missing important feature due to not sampling enough or correct data. Types are selection/sampling bias (unrepresentative data), interaction/participant bias (user do not want to tell so prejudiced), reporting/measurement bias (over-represents one group), and confirmation/automation bias (reinforcing existing stereotypes, when output is over trusted), and historical/societal bias (from data given in a historical setting tracking data that we feel important).
Decision trees = CASE statement algorithm. Good for transparency.
Explainability = explains WHY after we get the results. Black box. Good with debugging and troubleshooting. 
Interpretability = explains HOW it might make decisions, before it does anythingWhite box. Good with accountability and builds trust. transparency of model
Latency = Time to get response to request.  Real-time applications require low latency.
Nondeterminism = you can run it with the same input data, and it produces different outputs.
Overfitting = when works on the training data but not on the actual evaluation data. Because the model is memorizing the data it has seen and fails to generalize to unseen examples.
Underfitting = when works on the training data but not on the actual evaluation data. Because the model is not capturing all the features of the data and fails to generalize to unseen examples.
Variance = sensitivity to noise or overfitting.

General Math Abbreviations
AUC = Area Under the Curve
GLM = Generalized Linear Models
KNN = K-Nearest Neighbors
LDA = Latent Dirichlet Allocation algorithm 
MSE = Mean Squared Error
PCA = Principal Component Analysis algorithm 
RMSE = Root Mean Squared Error
ROC = Receiver Operator Curve
TF-IDF = Term Frequency-Inverse Document Frequency
XGBoost = eXtreme Gradient Boosting

Math Terms
"Accuracy" in Classification = successes / total 
Classification  data divided into categories
"Dimension" in ML = dataset property or characteristic expressed as a math dimension in vector math. Almost the same as "feature". ex = 3rd number in array.
F1 Score in Classification = harmonic mean of precision and recall. F1 = 2 x (Precision x Recall) / (Precision + Recall).
Generalized Linear Models = number of algorithms that allow linear regression with non-normal data.
Example regressions are: Logistic, Possion, Gamma, and Tweddie.
Linear regression = algorithm to find "best-fit line" of min diff between actual and model's predictions. Disadv: does not work with non-normal data.
Precision in Classification = total positives / total positives + false negatives
Recall in Classification = total positives / total true + false positives
Regularization = penalizes extreme weight values to help prevent linear models from overfitting.
Root Mean Squared Error = standard metric for regressions. calcs avg difference between predict and actual then square roots it.
Vectors = numerical N-dimensional arrays. Ex: Y and X coordinates if 2-D array.
Vector Space = the cloud shape of all the points of all of the vectors.
VAEs = encoder and decoder where latent space follows a probability distribution such as Gaussian.
XGBoost = gradient boosted trees algorithm. Good at regression, classifying, and ranking. Good on tabular data.

 

Comments

Popular posts from this blog

GHL Email Campaigns

Whitelabel Options

Await