ML Exam: 1 Bedrock
ML Exam: 1
Bedrock
Bedrock = Container serverless service for FMs from 3rd parties (such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon) or custom for GenAI. Fully managed. Has IAM security (with custom roles to limit data access), ensures the security of your data by encrypting it both in transit and at rest with KMS, watermark detection, single API, and good S3 link. If fetch error via S3, then check decryption rights. Features: 1) "Provisioned Throughput Mode" for handling large, steady workloads. 2) can create a RAG agent to fetch data in it.
Pricing: 1) On-Demand = good for variable or unpredictable workloads. 2) Provisioned Throughput: good for steady workloads or custom models. Users commit to a set throughput for 1 or 6-month periods for lower costs. 3) Batch Processing: good for asynchronous, large-scale tasks.
Agents = enables AI assistants to interact with specific data sources, query external APIs if real-time data, create and compare responses, and prioritize results. Good for chatbots, claims, etc.
Endpoints = fully managed. can even limit to region. "Data Capture" is a feature that records
data asynchronously to potentially retrain the model.
Guardrails = filters bad user input, PII, or bad topics to FMs.
Knowledge Bases = holds company info. When configured as a knowledge base for Bedrock, S3 enables auto doc chunking, embedding generation, and vector storage for retrieval during question-answering.
Chunking Strategies:
1. Default Chunking - Mechanism: Auto splits text docs into chunks of 300 tokens. Boundary Control: Balances size limits with honoring sentence boundaries (so don't chop sentences in half). Best For: Standard text assets like FAQs, blog posts, and internal docs. It serves as an ideal baseline for initial RAG prototyping.
2. Fixed-Size Chunking - Mechanism: Divides text into rigid segments based on a user-configured maximum token count per chunk. Overlap Percentage: Allows a configurable sliding window overlap (typically set between 10% to 20%). This ensures content continuity across sequential chunks. Best For: Uniformly structured datasets where predictable token budgets per document slice are strictly required.
3. Hierarchical Chunking - Mechanism: Partitions data into broad "parent chunks" split into precise "child chunks". Retrieval Logic: Search index evaluates child chunks for relevance. Once matched, gives parent chunk to FM. Best For: Complex, highly organized materials like academic papers, regulatory frameworks, and technical manuals containing dense sub-clauses.
4. Semantic Chunking - Mechanism: NLP. Detects topic shifts. Neighboring sentences are grouped or separated based on semantic similarity thresholds. Constraints: It is computationally intensive, adds model-processing latency, and fails on files over 1 million characters. Best For: Long-form or concept-dense docs (e.g., complex legal contracts) where structural line breaks do not align with thematic context shifts.
5. "No Chunking" via Lambda - . Mechanism: Routes external raw data to Lambda function. Allows developer-defined custom boundaries (e.g., splitting by PDF page #s or Markdown header levels). Allows inject chunk-level metadata for downstream hybrid search filtering. Best For: Special or proprietary data like LangChain or LlamaIndex tokenizers.
Model Evaluation = compares the FMs giving metrics.
Nova Canvas model = creates/edits hi-res image from prompt
Nova Lite model = supports multiple languages and is low cost. Nova Pro model = text/image/video analysis
Nova Reel model = creates videos PartyRock = very cheap and experimental environment for learning about gen AI apps, allowing users to quickly build, test, and iterate on AI apps without incurring significant costs.
Stable Diffusion 3.5 Large = creates high quality images based on text inputs.
Titan FMs:
1) Titan Text Premier: Optimized for enterprise, including RAG on Bedrock Knowledge Bases and function calling inside autonomous AI Agents. Does reasoning, data extraction, and heavy-duty text workflows.
2) Titan Text Express: General-purpose model (context length limit of 8,000 tokens). It balances speed and performance. Good for: interactive chat, open-ended brainstorming, and summaries.
3) Titan Text Lite: A lightweight, low cost for simple, high-volume tasks. Good at: text classification, basic copywriting, and fine-tuning on niche datasets.
4) Titan Image Generator models = text-to-image, inpainting, outpainting, erase object and other tools, and invisible watermarking.
5) Titan Text Embeddings V2: for RAG and vector searches (such as KNN). Accepts 8,192 tokens/input and supports over 25 languages. Has flexible vector sizing to reduce DB storage costs while retaining 97% accuracy.
6) Multimodal Embeddings: Multimodal. Builds rich, reverse-image search engines (searching images using other images) or search visual product catalogs using unstructured NLP text.
Comments
Post a Comment