AI Exam 3B - Data and Learning

Abbreviations
IDP = Intelligent Data Processing,

Data Terms

Classification = SL. groups data into known labeled groups. ex1: data = car pictures labeled by maker and model, ex2: customer sentiment grouping. Types are binary classification and multi-class classification.

Clustering = UL. groups data with no labels into previously unknown groups.

Curating = structures the data for processing prior to learning. See Labeling.

De-identification = removing PII such as social security

Encoding = converts from non-numeric to numeric.

Governance = managing, securing, and monitoring data throughout its lifecycle

IDP = extracts and classifies unstructured data in docs. gives summaries and actionable insights.

Labeling = id and tags with content labels of each piece thus classifying. See Curating.

Multi-Modal = uses multiple data types (such as text, images, audio, video, and computer code).

Multi-Modal Embedding = uses multiple data types embedding them into a shared space. search focus.

Multi-Modal Generation = uses multiple data types to create new content.

Normalizing = scales numbers to fit range without distorting.

Retention = when delete or keep rules

Summarization = tells key insights

Data Unit Terms

"Embeddings" = converts discrete data (like words, images, or categorical features) into numerical vectors that capture semantic relationships. Purpose: Enables models to process and compare inputs efficiently.

"Embedding"/"Transformation"/"Vectorization" process = the algorithm that creates the math vector.

"Feature" in ML = dataset property or characteristic used as ML models input to make predictions. Almost the same as "dimension". ex = square feet, actual price, asking price, etc.

"Point" in AI = exact coordinates on array with a number for each of the dimensions.

Self-Attention = enables a model to weigh the importance of different words in a sequence relative to a specific word. For context and long-range dependencies. Ex: "bank" in "river bank" versus "bank deposit".

"Token" in AI = smallest unit of work. main factor in overall cost. ex: words in a sentence. If error with language model, check max context size that limits tokens they can process at once. If a book's length exceeds this limit, the model cannot handle the full input, leading to failure in summarization.

Learning Terms

Continued Pre-training = you provide U to pre-train a FM by showing it with certain types of inputs.

Deductive = using general rules to specific outcomes

Emergent = at large scales, these models develop skills that are not explicitly programmed into them.

Federated Learning = instead of bring data to central server (traditional), this brings model to the data. Good for data privacy and local compliance.

"Fine tuning" = improves existing pre-trained LM using L (ex: industry-specific data) or pairs of input and desired output. Most important task for fine tuning is labeling with accurate and relevant labels. Types are instruction tuning, RHLF, adapting models for specific domains, transfer learning, and continuous pretraining.

"Generalization" = model's ability to apply knowledge from training on new unseen data.

Inductive = using evidence to determine outcome. Builds a general model to predict future, unseen data.

Instruction Tuning = method to fine-tune LLMs on instructional prompts and desired outputs.

"Masking" of Input = intentionally hiding parts of the input, forces models to understand context

Training in ML = iterative teaching a ML model to find patterns, make decisions, or generate content.

Transductive = predicts specific labels for fixed set of U by using both L training and distribution of the U test. Optimizes for performance of specific dataset.

Transfer Learning = takes existing pre-trained model on supervised task and then fine tunes.

Search This Blog

Ones and Zeros

AI Exam 3B - Data and Learning

Comments

Post a Comment

Popular posts from this blog

GHL Email Campaigns

Await

Whitelabel Options