AI Exam: 3B - Data

AI Practitioner Exam Prep - Data

Abbreviations
IDP = Intelligent Data Processing

Data Terms
Classification = SL. groups data into known labeled groups. ex1: data = car pictures labeled by maker and model, ex2: customer sentiment grouping. Types are binary and multi-class classification.
Clustering = UL. groups data with no labels into previously unknown groups.
Curating = structures the data for processing prior to learning. See Labeling.
De-identification = removing PII such as social security
Encoding = converts from non-numeric to numeric.
Governance = managing, securing, and monitoring data throughout its lifecycle
IDP = extracts and classifies unstructured data in docs. gives summaries and actionable insights.
Labeling = id and tags with content labels of each piece thus classifying. See Curating.
Multi-Modal = uses multiple data types (such as text, images, audio, video, and computer code).
Multi-Modal Embedding = uses multiple data types embedding them into a shared space. search focus.
Multi-Modal Generation = uses multiple data types to create new content.Normalizing = scales numbers to fit range without distorting.
Retention = when delete or keep rules
Summarization = tells key insights

Data Unit Terms
"Embeddings" = converts discrete data (like words, images, or categorical features) into numerical vectors that capture semantic relationships. Purpose: Enables models to process and compare inputs efficiently.
"Embedding"/"Transformation"/"Vectorization" process = the algorithm that creates the math vector.
"Feature" in ML = dataset property or characteristic used as ML models input to make predictions. Almost the same as "dimension". ex = square feet, actual price, asking price, etc.
"Point" in AI = exact coordinates on array with a number for each of the dimensions.
Self-Attention = enables a model to weigh the importance of different words in a sequence relative to a specific word. For context and long-range dependencies. Ex: "bank" in "river bank" versus "bank deposit".
"Token" in AI = smallest unit of work. main factor in overall cost. ex: words in a sentence. If error with language model, check max context size that limits tokens they can process at once. If a book's length exceeds this limit, the model cannot handle the full input, leading to failure in summarization.

Comments

Popular posts from this blog

GHL Email Campaigns

Await

Whitelabel Options