skip to content
luminary.blog
by Oz Akan
cover

AWS AI Practitioner Certification Notes

Guide to the AWS Certified AI Practitioner exam, covering key concepts, AWS services, and real-world applications.

/ 44 min read

Updated:
Table of Contents

Understanding AI Concepts

Hierarchy of AI Concepts

AI Concepts:

  • Artificial Intelligence (AI): Broadly encompasses machines mimicking human intelligence.

  • Machine Learning (ML): Subset of AI where machines learn from data without explicit programming.

  • Deep Learning: Subset of ML using multi-layered neural networks for complex pattern analysis.

  • Generative AI: Application of deep learning focused on creating new content.

AI -> ML -> Deep Learning -> Generative AI

Generative AI and Foundational Models (FM)

Introduction to Generative AI

Generative AI, a cutting-edge field in artificial intelligence, focuses on creating new content. This capability is powered by foundational models (FMs), which are large-scale deep learning models trained on extensive datasets. These models possess a broad understanding of various aspects of the world and can generate diverse content, including text, images, and code.

Lifecycle of a Foundational Model

The development and deployment of an FM typically involves a six-stage lifecycle:

  1. Data Selection: The process begins with carefully choosing the data on which the model will be trained. This data should be vast, diverse, and relevant to the intended applications of the model.

  2. Pre-training: This stage involves training the model on the selected data using self-supervised learning. In this approach, the model generates labels from the data itself, learning to identify patterns and relationships without explicit human annotation.

  3. Optimization: Once pre-trained, the model can be further optimized for specific tasks or domains. Techniques like prompt engineering, retrieval-augmented generation (RAG), and fine-tuning on task-specific data are commonly used to enhance the model’s performance.

  4. Evaluation: Thorough evaluation is crucial to assess the model’s capabilities and identify areas for improvement. This involves testing the model on various benchmarks and datasets to measure its accuracy, fluency, and ability to handle different tasks.

  5. Deployment: After successful evaluation, the model can be deployed to various applications, such as chatbots, content creation tools, or research platforms. This often involves integrating the model into APIs or other systems to enable seamless access and utilization.

  6. Feedback and Continuous Improvement: The lifecycle is not linear but iterative. Feedback from users and ongoing monitoring of the model’s performance provide valuable insights for further optimization and refinement. This continuous improvement ensures the model remains effective and adapts to evolving needs and challenges.

Large Language Models (LLMs)

Introduction to LLMs

Large Language Models (LLMs), a prominent type of foundational model, are specifically designed to understand and generate human language. They are trained on massive text datasets, enabling them to perform various language-related tasks, such as translation, summarization, question answering, and dialogue generation.

Key Concepts in LLMs

  • Transformers Architecture: LLMs utilize a specific neural network architecture called transformers, which excels at processing sequential data like text. This architecture allows the model to capture long-range dependencies and relationships between words, leading to a deeper understanding of language.

  • Tokens: LLMs process text by breaking it down into smaller units called tokens. These tokens can be words, subwords, or even individual characters. Tokenization standardizes the input data, making it easier for the model to process and analyze.

  • Embeddings and Vectors: LLMs represent tokens as numerical vectors, known as embeddings. These embeddings capture the semantic meaning of words and their relationships with other words. By analyzing these vectors, the model can understand the context and nuances of language.

Diffusion Models

Introduction to Diffusion Models

Diffusion models, a class of deep learning models, have gained significant attention for their ability to generate high-quality images and other data. These models operate by gradually adding noise to the training data until it becomes completely random, and then learning to reverse this process to generate new data from noise.

Applications and Process

The most notable applications of diffusion models are in text-to-image generation, where they create realistic images from textual descriptions. However, their potential extends beyond images, with applications in areas like drug discovery and materials design.

The diffusion process involves two main steps:

  1. Forward Diffusion: Noise is progressively added to the data until it reaches a state of pure noise.

  2. Reverse Diffusion: The model learns to reverse this process, starting from pure noise and gradually refining it to generate meaningful data.

Multimodal Models

Multimodal models represent a significant advancement in AI, capable of processing and generating information from multiple modalities, such as text, images, audio, and video. This ability allows them to understand and respond to complex inputs that involve different types of data, leading to more comprehensive and nuanced AI applications.

Other Generative Models

Introduction to Other Generative Models

This section explores additional generative models that contribute to the diverse landscape of AI:

Generative Adversarial Networks (GANs)

GANs consist of two neural networks, a generator and a discriminator, that compete against each other. The generator creates new data instances, while the discriminator evaluates their authenticity. This adversarial process pushes both networks to improve, leading to the generation of increasingly realistic data.

Variational Autoencoders (VAEs)

VAEs utilize a combination of deep learning and statistical techniques to learn a compressed representation of the input data. They can then generate new data by sampling from this compressed representation, offering a powerful tool for creative applications.

Optimizing Model Outputs

The performance of generative AI models can be significantly enhanced through various optimization techniques:

  • Prompt Engineering: This involves carefully crafting the input prompts to guide the model towards generating desired outputs. It’s a crucial technique for controlling the model’s behavior and ensuring its responses align with specific needs.

  • Fine-tuning: This process involves further training a pre-trained model on a smaller, task-specific dataset to adapt it to a particular application or domain. Fine-tuning enhances the model’s accuracy and relevance for specific use cases.

  • Retrieval-Augmented Generation: This technique involves providing the model with access to external knowledge sources, such as databases or documents, to enhance its ability to generate informed and contextually relevant responses.

AWS Services for AI

AWS offers a comprehensive suite of services that cater to various aspects of AI development and deployment:

  • Amazon SageMaker: A fully managed service that provides tools for building, training, and deploying machine learning models.

  • Amazon Comprehend: A natural language processing (NLP) service that enables text analysis, sentiment analysis, and topic modeling.

  • Amazon Translate: A language translation service that supports multiple languages and can be integrated into various applications.

  • Amazon Textract: A service that extracts text and data from scanned documents, images, and PDFs.

  • Amazon Lex: A service for building conversational interfaces and chatbots, enabling natural language interactions with users.

  • Amazon Polly: A text-to-speech service that converts text into lifelike speech, supporting multiple languages and voices.

  • Amazon Transcribe: An automatic speech recognition (ASR) service that converts speech to text, enabling transcription and analysis of audio data.

  • Amazon Rekognition: A computer vision service that analyzes images and videos to identify objects, scenes, and faces.

  • Amazon Kendra: An enterprise search service that uses machine learning to provide accurate and relevant search results from various data sources.

  • Amazon Personalize: A service that enables real-time personalization and recommendations, tailoring user experiences based on their preferences and behavior.

  • AWS DeepRacer: A 1/18th scale autonomous race car that provides a hands-on learning experience for reinforcement learning.

  • Amazon SageMaker JumpStart: A service that offers pre-built solutions and models for common machine learning use cases, accelerating development.

  • Amazon Bedrock: A service that provides access to foundation models (FMs) from various providers, enabling developers to integrate generative AI capabilities into their applications.

  • PartyRock: An Amazon Bedrock Playground for experimenting with AI applications and exploring the capabilities of different FMs.

  • Amazon Q: A generative AI-powered assistant that can answer questions, generate content, and complete tasks based on enterprise data.

  • Amazon Q Developer: An ML-powered code recommendation tool that assists developers in writing code for various programming languages and applications.

Real-World Use Cases of Generative AI

Applications of Generative AI Across Industries

Generative AI is transforming various industries by enabling new capabilities and automating complex tasks. Here are some notable use cases across different sectors:

Media and Entertainment

  • Content Generation: AI can generate scripts, dialogues, and even entire stories for movies, TV shows, and games, enhancing creativity and efficiency in content production.

  • Virtual Reality: AI can create immersive and interactive virtual environments, enriching gaming experiences and enabling realistic simulations for training and entertainment.

  • News Generation: AI can generate news articles and summaries based on raw data or events, automating content creation and providing concise information.

Retail

  • Product Review Summaries: AI can summarize customer reviews for products, providing concise insights that help consumers make informed purchasing decisions.

  • Pricing Optimization: AI can model different pricing scenarios to determine optimal pricing strategies that maximize profits and maintain competitiveness.

  • Virtual Try-ons: AI can generate virtual models of customers for virtual try-on experiences, enhancing online shopping and reducing purchase uncertainty.

  • Store Layout Optimization: AI can optimize store layouts to improve customer experience and boost sales by analyzing customer traffic patterns and product placement.

Healthcare

  • AWS HealthScribe: This service enables healthcare software vendors to build applications that automatically generate clinical notes by analyzing patient-clinician conversations, improving documentation efficiency and accuracy.

  • Personalized Medicine: AI can generate personalized treatment plans based on a patient’s genetic makeup, lifestyle, and disease progression, leading to more effective and targeted healthcare.

  • Medical Imaging: AI can enhance, reconstruct, and even generate medical images, aiding in diagnosis and treatment planning.

Life Sciences

  • Drug Discovery: AI can generate potential molecular structures for drugs, accelerating the drug discovery process and reducing development costs.

  • Protein Folding Prediction: AI can predict the 3D structures of proteins, crucial for understanding diseases and developing new therapies.

  • Synthetic Biology: AI can generate designs for synthetic biological systems, such as engineered organisms or biological circuits, with applications in various fields.

Financial Services

  • Fraud Detection: AI can create synthetic datasets to train fraud detection models, improving their ability to identify and prevent fraudulent activities.

  • Portfolio Management: AI can simulate market scenarios to aid in the creation and management of robust investment portfolios, optimizing returns and mitigating risks.

  • Debt Collection: AI can generate effective communication strategies for debt collection, increasing the rate of successful collections while maintaining ethical practices.

Manufacturing

  • Predictive Maintenance: AI can predict maintenance schedules for machinery, reducing downtime and optimizing production efficiency.

  • Process Optimization: AI can optimize production processes by modeling different scenarios and considering factors like cost, time, and resource usage.

  • Product Design: AI can generate new product designs based on specified parameters and constraints, optimizing for factors like cost, materials, and performance.

  • Material Science: AI can generate new material compositions with desired properties, leading to innovations in manufacturing and product development.

When to Use AI/ML

Artificial intelligence and machine learning (AI/ML) offer powerful solutions for complex tasks that are challenging to address with traditional programming approaches. Here are some key scenarios where AI/ML excels:

  1. Complexity of Rules: When tasks involve intricate rules or numerous variables that are difficult to code manually, AI/ML can learn from data and adapt to complex scenarios.

  2. Scale of the Project: For large-scale tasks that would be inefficient or impossible for humans to handle manually, AI/ML can automate processes and analyze vast amounts of data efficiently.

Supervised Learning

Supervised learning is a type of machine learning where the model is trained on labeled data, meaning the data includes input features and their corresponding desired outputs. This approach allows the model to learn the relationship between inputs and outputs and make predictions on new, unseen data.

Key Techniques

  • Classification: This technique involves assigning labels or categories to data instances based on patterns learned from the training data.

  • Regression: This technique predicts continuous or numerical values based on input variables, such as predicting stock prices or customer churn probability.

Unsupervised Learning

Unsupervised learning involves training a model on unlabeled data, where the model identifies patterns and structures in the data without explicit guidance. This approach is useful for tasks like clustering and dimensionality reduction.

Key Techniques

  • Clustering: This technique groups data points into clusters based on their similarity, enabling the identification of natural groupings and patterns in the data.

  • Dimensionality Reduction: This technique reduces the number of features or variables in a dataset while preserving important information, simplifying data analysis and visualization.

Challenges of Generative AI

While generative AI offers tremendous potential, it also presents unique challenges that need careful consideration:

  1. Regulatory Violations: Generative AI models can inadvertently generate content that violates regulations or exposes sensitive data. Mitigations include data anonymization, privacy-preserving techniques, and regular audits.

  2. Social Risks: The ability to generate realistic content raises concerns about misuse for creating harmful or misleading information. Thorough testing and ethical guidelines are essential to mitigate these risks.

  3. Data Security and Privacy Concerns: Generative AI models can be vulnerable to attacks that aim to extract training data or manipulate their behavior. Robust cybersecurity measures and responsible data handling practices are crucial.

  4. Toxicity: Models can generate toxic or offensive content if trained on biased or inappropriate data. Careful data curation and the use of guardrail models can help mitigate this issue.

  5. Hallucinations: Generative AI models can sometimes generate inaccurate or nonsensical information, known as hallucinations. User education and content verification mechanisms are important to address this challenge.

  6. Interpretability: Understanding why a model generates specific outputs can be challenging. Using domain knowledge in model development and employing explainable AI techniques can improve interpretability.

  7. Nondeterminism: Generative AI models can produce different outputs for the same input due to their probabilistic nature. Extensive testing and fine-tuning can help improve consistency and reliability.

Capabilities of Generative AI

Generative AI offers a range of capabilities that are transforming various aspects of technology and business:

  1. Task Automation: Automating tedious and repetitive tasks, freeing up human resources for more creative and strategic endeavors.

  2. Decision Support: Analyzing data and identifying patterns to aid in informed decision-making, improving efficiency and accuracy.

  3. Adaptability: Learning from data to generate content for diverse tasks and domains, demonstrating versatility and broad applicability.

  4. Responsiveness: Generating real-time content for dynamic interactions, enabling engaging and personalized user experiences.

  5. Simplicity: Simplifying complex content creation processes, making technology more accessible and user-friendly.

  6. Creativity: Generating novel ideas, designs, and solutions by recombining elements in unique ways, fostering innovation and exploration.

  7. Data Efficiency: Learning from limited datasets, making AI applicable in scenarios with scarce data.

  8. Personalization: Creating tailored content to enhance user experiences and engagement, leading to more effective and satisfying interactions.

  9. Scalability: Quickly generating large amounts of content once trained, enabling efficient content production and distribution.

Business Metrics for Generative AI

Measuring the effectiveness of generative AI in a business context requires specific metrics that capture its impact on key objectives. Here are some relevant metrics:

  • User Satisfaction: Assessing how satisfied users are with the content generated by the AI, reflecting its ability to meet user needs and expectations.

  • Average Revenue Per User (ARPU): Measuring the average revenue generated per user attributed to the generative AI

Core Dimensions for Responsible AI

Responsible AI isn’t just about building cool technology; it’s about building technology that’s good for everyone. These principles guide us towards ethical and beneficial AI development:

  1. Fairness: AI systems should be inclusive and avoid creating or reinforcing biases that could lead to unfair or discriminatory outcomes.

  2. Explainability: It should be possible to understand how an AI system arrived at its conclusions. This transparency helps build trust and allows for better analysis and improvement.

  3. Privacy and Security: Protecting user data is paramount. AI systems must be designed to respect privacy and safeguard sensitive information.

  4. Transparency: Open communication about how AI systems work helps stakeholders make informed decisions and fosters accountability.

  5. Veracity and Robustness: AI systems should be reliable and perform consistently, even in unexpected situations or changing environments.

  6. Governance: Clear guidelines and policies are necessary to ensure responsible AI practices are followed within organizations and across the industry.

  7. Safety: AI systems should be designed with safety in mind, minimizing potential risks and maximizing benefits for individuals and society.

  8. Controllability: We need mechanisms to monitor and guide AI behavior, ensuring it aligns with human values and intentions.

AWS AI Service Cards

AWS AI Service Cards are your go-to resource for navigating the world of AWS AI services. Think of them as a guidebook, providing essential information about each service in a clear and concise format.

These cards are designed with responsible AI principles in mind, offering a single source of truth for understanding:

  • Core Concepts: A breakdown of the fundamental ideas behind the service, making it easier to grasp its purpose and functionality.
  • Intended Use Cases and Limitations: Real-world applications of the service and scenarios where it might not be the best fit.
  • Responsible AI Design Considerations: Insights into how the service incorporates ethical considerations like fairness, transparency, and privacy.
  • Deployment and Performance Optimization: Best practices for deploying and getting the most out of the service.

With AWS AI Service Cards, you can confidently explore and implement AI solutions on AWS while adhering to responsible AI practices.

ML Development Lifecycle

Building a successful machine learning (ML) model involves more than just algorithms and data. It’s a journey with distinct stages, each crucial for achieving your desired outcome. Here’s a breakdown of the typical ML development lifecycle:

1. Business Goal Identification:

  • Start with the “why.” Clearly define the business problem you want to solve with ML. What are you trying to achieve? What questions need answers?

2. ML Problem Framing:

  • Translate the business goal into a specific ML task. Are you trying to predict a value (regression), classify data into categories (classification), or uncover hidden patterns (clustering)?

3. Data Processing:

  • Data Collection: Gather the relevant data from various sources.
  • Data Preprocessing: Clean, transform, and prepare the data for model training. This includes handling missing values, outliers, and formatting inconsistencies.
  • Feature Engineering: Select, extract, or create meaningful features that will improve the model’s performance.

4. Model Development:

  • Training: Feed the processed data to your chosen ML algorithm to learn patterns and relationships.
  • Tuning: Optimize the model’s parameters (hyperparameters) to achieve the best performance on a validation dataset.
  • Evaluation: Assess the model’s performance on a separate test dataset to estimate how well it will generalize to new, unseen data.
    • A common strategy is to split labeled data into training, validation, and testing subsets, typically with ratios like 80/10/10 or 70/15/15.

5. Model Deployment:

  • Make the model available for use. This could involve integrating it into an application, deploying it as an API, or embedding it in a device.
  • Inference and Prediction: Use the deployed model to generate predictions or insights from new data.

6. Model Monitoring:

  • Continuously track the model’s performance over time. Is it maintaining accuracy? Are there any signs of drift or degradation?

7. Model Retraining:

  • Periodically retrain the model with new data to ensure it stays relevant and accurate as the environment changes.

This lifecycle is iterative. You might need to revisit earlier stages based on feedback and new information gathered along the way.

Amazon SageMaker

Amazon SageMaker is a comprehensive machine learning (ML) service that empowers developers and data scientists to build, train, and deploy models at scale. Here’s a breakdown of its key features:

Data Preparation

SageMaker offers a suite of tools to streamline data preparation:

  • SageMaker Data Wrangler: This low-code/no-code tool provides an end-to-end solution for data import, preparation, transformation, and analysis through a user-friendly web interface.
  • SageMaker Studio Classic: For advanced users and large-scale data preparation, Studio Classic integrates with Amazon EMR and AWS Glue interactive sessions, enabling seamless data processing within notebooks.
  • SageMaker Processing API: This API allows you to run scripts and notebooks for data processing, transformation, and analysis, supporting various ML frameworks like scikit-learn, MXNet, and PyTorch in fully managed environments.

Feature Management

  • SageMaker Feature Store: This feature store helps data scientists, ML engineers, and practitioners create, share, and manage features for ML development, ensuring consistency and reusability.

Model Training

SageMaker provides flexible options for model training:

  • Built-in Algorithms and Custom Algorithms: Train models using a wide range of built-in algorithms or bring your own custom algorithms. SageMaker manages the underlying infrastructure, making training efficient and scalable.
  • SageMaker Canvas: For a low-code experience, SageMaker Canvas allows users to generate predictions without writing any code, making ML accessible to a broader audience.
  • SageMaker JumpStart: Accelerate model development with pre-trained, open-source models for various problem types, providing a starting point for your ML projects.

Model Evaluation and Tuning

SageMaker offers tools to evaluate and optimize model performance:

  • SageMaker Experiments: Experiment with different data, algorithms, and parameters while tracking the impact of changes on model accuracy.
  • SageMaker Automatic Model Tuning: Automatically find the best version of your models by running multiple training jobs with different hyperparameter combinations and evaluating their performance.

Model Deployment and Inference

SageMaker simplifies model deployment and inference:

  • Flexible Deployment Options: Deploy models to a variety of ML infrastructure with options to meet your specific inference needs.
  • SageMaker Endpoints: Deploy models as endpoints for real-time inference, making them readily available for applications and services.

Model Monitoring

SageMaker helps maintain model quality in production:

  • SageMaker Model Monitor: Monitor the quality of deployed models by detecting deviations in data quality, model quality, bias drift, and feature attribution drift. Set up continuous or scheduled monitoring to ensure ongoing performance.

With its comprehensive suite of tools and services, Amazon SageMaker simplifies the entire ML workflow, from data preparation to model deployment and monitoring, making machine learning accessible and efficient for all users.

SageMaker Built-in Algorithms

Amazon SageMaker offers a wide range of built-in algorithms to simplify and accelerate your machine learning (ML) projects. These algorithms are optimized for performance and scalability, making them suitable for various use cases. Here’s a breakdown of some key SageMaker algorithms:

Supervised Learning

  • Linear Learner: A versatile algorithm for both classification and regression tasks. It’s particularly effective for problems with large datasets and high dimensionality.
  • Factorization Machines: A powerful algorithm for recommendation systems and tasks involving sparse datasets. It excels at capturing interactions between features.
  • XGBoost: A popular gradient boosting algorithm known for its high accuracy and efficiency. It’s widely used for classification and regression tasks.
  • K-Nearest Neighbors (KNN): A simple but effective algorithm for classification and regression. It classifies data points based on the classes of their nearest neighbors in the feature space.

Unsupervised Learning

  • Clustering:
    • K-means: A widely used clustering algorithm that partitions data points into clusters based on their similarity.
    • Latent Dirichlet Allocation (LDA): A probabilistic topic modeling algorithm that uncovers hidden topics within a collection of documents.
  • Topic Modeling:
    • Latent Dirichlet Allocation (LDA): As mentioned above, LDA is also effective for topic modeling.
  • Embeddings:
    • Object2Vec: An algorithm for learning embeddings of objects, such as words, products, or users. It’s useful for tasks like recommendation systems and information retrieval.
  • Anomaly Detection:
    • Random Cut Forest: An efficient algorithm for detecting anomalies in streaming data. It’s particularly useful for applications like fraud detection and network security.
    • IP Insights: An algorithm for identifying potentially malicious IP addresses based on their activity patterns.
  • Dimension Reduction:
    • Principal Component Analysis (PCA): A widely used technique for reducing the dimensionality of data while preserving important information.

Image/Videos

  • Image Classification:
    • MXNet TensorFlow (ResNet, ImageNet…): SageMaker provides pre-trained models and frameworks like ResNet and ImageNet for image classification tasks.
  • Object Detection:
    • MXNet TensorFlow (ResNet, ImageNet…): Similar to image classification, SageMaker offers pre-trained models and frameworks for object detection.
  • Semantic Segmentation:
    • Fully Convolutional Network (FCN): An algorithm for pixel-wise classification of images, assigning a label to each pixel.
    • Pyramid Scene Parsing (PSP): An algorithm for scene parsing, understanding the context and relationships between objects in an image.
    • DeepLab V3 with ResNet: A state-of-the-art algorithm for semantic segmentation, achieving high accuracy in image segmentation tasks.

Time Series

  • DeepAR: A recurrent neural network (RNN) based algorithm for forecasting time series data. It’s useful for applications like demand forecasting and financial modeling.

Text

  • Text Classification:
    • BlazingText: A highly optimized algorithm for text classification, achieving fast training and inference speeds.
  • Word2Vec:
    • BlazingText: BlazingText can also be used for learning word embeddings, representing words as dense vectors.
  • Machine Translation:
    • Sequence to Sequence: A neural network architecture for machine translation, translating sequences of words from one language to another.
  • Topic Modeling:
    • Latent Dirichlet Allocation (LDA): As mentioned earlier, LDA is also effective for topic modeling.
    • Neural Topic Modeling (NTM): A deep learning based approach to topic modeling, uncovering hidden topics within text data.

Speech

  • Sequence to Sequence: This architecture can also be applied to speech recognition and generation tasks, processing sequences of audio data.

Model Fit

When training machine learning (ML) models, achieving the right “fit” is crucial for optimal performance. Let’s explore the concepts of overfitting, underfitting, and how bias and variance play a role.

Overfitting

Imagine a student who memorizes the entire textbook but struggles to apply the concepts to new problems. That’s overfitting! The model performs exceptionally well on the training data but fails to generalize to unseen examples. It’s like the model has become too focused on the specific details of the training data and missed the broader patterns.

Characteristics of Overfitting:

  • Low Bias: The model accurately captures the relationships in the training data.
  • High Variance: The model’s predictions are highly sensitive to fluctuations in the data, leading to inconsistent performance on new data.

Underfitting

Now picture a student who hasn’t grasped the fundamental concepts and struggles even with the textbook examples. That’s underfitting! The model performs poorly even on the training data, indicating it hasn’t captured the underlying relationships between the input features and the target variable.

Characteristics of Underfitting:

  • High Bias: The model oversimplifies the relationships in the data.
  • Low Variance: The model’s predictions are relatively consistent but inaccurate due to its limited understanding of the data.

The Balanced Fit

The sweet spot lies in finding a balance between overfitting and underfitting. A well-balanced model accurately captures the patterns in the training data while also generalizing effectively to new, unseen data.

Characteristics of a Balanced Fit:

  • Low Bias: The model accurately captures the relationships in the data.
  • Low Variance: The model produces consistent and accurate predictions across different datasets.

Bias and Variance: The Trade-off

Think of bias as the error introduced by approximating a real-world problem with a simplified model. High bias leads to underfitting. Variance, on the other hand, represents the model’s sensitivity to fluctuations in the data. High variance leads to overfitting.

The goal in ML is to minimize both bias and variance, achieving a model that accurately reflects the true relationship in the data and produces reliable predictions across different datasets. This “Goldilocks” zone of low bias and low variance represents the balanced fit we strive for.

Model Evaluation Metrics

Evaluating your machine learning (ML) model’s performance is crucial to ensure it’s meeting your goals. Different types of ML tasks require different evaluation metrics. Here’s a breakdown of common metrics for classification and regression problems:

Classification Metrics

  • Accuracy: The overall proportion of correctly classified instances.

  • Precision: The proportion of true positive predictions among all positive predictions. It focuses on minimizing false positives.   

  • Recall (Sensitivity): The proportion of true positive predictions among all actual positive instances. It emphasizes minimizing false negatives.

  • F1-Score: The harmonic mean of precision and recall, providing a balanced measure that considers both false positives and false negatives.   

  • AUC-ROC (Area Under the Receiver Operating Characteristic Curve): A measure of the model’s ability to distinguish between classes across different thresholds.

Confusion Matrix

A confusion matrix is a helpful tool for visualizing the performance of a classification model. It shows the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).

  • TP True Positive
  • TN True Negative
  • FP False Positive
  • FN False Negative

Formulas

Accuarcy=TP+TNTP+FP+TN+FN\text{Accuarcy} = \frac {TP + TN}{TP + FP + TN + FN}

Precision (PPV)=TPTP+FP\text{Precision (PPV)} = \frac {TP}{TP+FP}

Recall (TPR)=TPTP+FN\text{Recall (TPR)} = \frac {TP}{TP+FN}

F1 score=2×Precision×RecallPrecision+Recall\text {F1 score} = \frac {2 \times \text{Precision} \times \text{Recall}}{\text{Precision} +\text{Recall}}

In English

  • Precision: Imagine you’re searching for relevant documents in a database. Precision tells you how many of the documents you retrieved are actually relevant. It’s like a measure of accuracy in your selection process.

  • Recall: Now, think about how many of the truly relevant documents you were able to find. Recall measures how comprehensive your search was in capturing all the relevant items.

  • AUC-ROC: This metric takes a broader view. It evaluates the model’s ability to distinguish between different classes across various thresholds. Think of it as a measure of how well the model can separate different categories.

    • The AUC-ROC curve helps visualize the trade-off between true positives and false positives at different thresholds. This allows you to choose a threshold that best suits your specific needs, balancing the importance of correctly identifying positive cases with the risk of misclassifying negative ones.

Regression Metrics

In regression tasks, where the goal is to predict continuous values, we use specific metrics to evaluate model performance:

Mean Squared Error (MSE):

  • Calculates the average squared difference between predicted and actual values
  • A lower MSE indicates better accuracy (predictions closer to true values)
  • Process: take the difference between prediction and actual value, square it, then sum all squared differences
  • The smaller the MSE, the better the model’s predictive accuracy

R-squared (Coefficient of Determination):

  • Represents the proportion of variance in the target variable explained by the model
  • Ranges from 0 to 1, with higher values indicating better fit
  • An R-squared of 1 means perfect prediction
  • Provides a measure of the model’s goodness of fit to the data

While MSE focuses on prediction error magnitude, R-squared shows how well the model explains variability in the data. Both metrics offer complementary perspectives on model performance.

MLOps

MLOps combines people, technology, and processes to deliver collaborative ML solutions. It refers to operationalizing and streamlining the end-to-end machine learning lifecycle from development and deployment to monitoring and maintenance, ensuring models are systematically deployed, monitored, and retrained.

Goal of MLOps

The primary goal is to get ML workloads into production and keep them operating. MLOps adopts DevOps principles for machine learning to:

  • Increase model development pace through automation
  • Improve quality metrics through testing and monitoring
  • Promote collaboration between data scientists, data engineers, software engineers, and IT operations
  • Provide transparency, explainability, audibility, and security through model governance

Key Principles of MLOps

  • Version Control: Tracking changes to code, data, and models
  • Automation: Streamlining repetitive processes
  • CI/CD: Continuous integration and continuous deployment
  • Model Governance: Requires collaboration between data scientists, engineers, and business stakeholders with clear documentation and communication channels. Includes protecting sensitive data, securing access, meeting compliance requirements, and implementing structured review processes to check for fairness, bias, and ethics.

Prompting

Inference Parameters

  • Temperature: Controls the randomness or creativity of the model’s output

  • Top P: Controls text diversity by limiting word choices based on their probabilities (scale: 0-1)

    • Example: With top p=0.250, the model only considers words in the top 25% of the probability distribution
  • Top K: Limits consideration to only the k most probable words, regardless of their probability percentages

    • Example: With top k=10, the model only considers the 10 most probable next words, creating more focused output

Top-p Example

Assume a language model predicts the following probabilities for the next token:

TokenProbability
A0.4
B0.3
C0.1
D0.1
E0.05
F0.05

With top-p sampling and p=0.8:

  • Sort the tokens: A (0.4), B (0.3), C (0.1), D (0.1), E (0.05), F (0.05).
  • Calculate cumulative probabilities:
    • A: 0.4
    • A + B: 0.7
    • A + B + C: 0.8
    • A + B + C + D: 0.9
  • Stop when cumulative probability exceeds 0.8.
  • The top-p set is {A, B, C}.

Chain-of-Thought (CoT) Prompting

Use the phrase “Think step by step.”

Prompt Misuses and Risks

This chapter discusses various types of prompt misuses and risks associated with foundation models (FMs) in AI:

  1. Poisoning, Hijacking, and Prompt Injection:

    • Poisoning: Intentionally introducing malicious or biased data into the model’s training dataset
    • Hijacking and Prompt Injection: Influencing model outputs by embedding specific instructions within prompts, potentially for malicious purposes or customization
  2. Exposure and Prompt Leaking:

    • Exposure: Risk of revealing sensitive information from the training corpus during inference
    • Prompt Leaking: Unintentional disclosure of prompts or inputs used within a model, which can reveal how the model works
  3. Jailbreaking:

    • Circumventing a model’s constraints and safety measures to gain unauthorized access or functionality
    • Often involves crafting prompts to bypass ethical and safety constraints

Amazon Bedrock

Capabilities

The capabilities of Amazon Bedrock include the following:

  • Foundation models that include a choice of base FMs and customized FMs
  • Playgrounds for chat, text, and images with quick access to FMs for experimentation and use through the console
  • Safeguards such as watermark detection and guardrails
  • Orchestration and automation for your application with knowledge bases and agents
  • Assessment and deployment with model evaluation and provisioned throughput

Build with comprehensive data protection and privacy

With Amazon Bedrock, your data—including prompts, information used to supplement prompts, FM responses, and customized FMs—remains in the AWS Region where the API call is processed. Your data is encrypted in transit with TLS 1.2 and at rest with service-managed AWS Key Management Service (AWS KMS) keys.

You can use AWS PrivateLink with Amazon Bedrock to establish private connectivity between your FMs and on-premises networks without exposing your traffic to the internet. In addition, you can customize FMs privately, retaining control over how your data is used and encrypted. Amazon Bedrock makes a separate copy of the base FM model and trains this private copy of the model.

PII data

You can supply personally identifiable information (PII) data in input prompts to Amazon Titan or third-party models. The third-party models will have their own ways of handling the data, but Amazon Titan will always accept it as input.

When Amazon Titan provides output from the prompt, any PII data present that was also in the input prompt will remain in cleartext in the output. Any PII data in the output that was not present in the input prompt will be masked. For more information about the handling of PII data, see the third-party model provider’s EULA.

Cost

Two models:

  • On-demand
  • Provisioned throughput

Amazon Q Business

 Amazon Q Business is a generative AI-powered assistant that can answer questions, generate content, create summaries, and complete tasks—all based on the information in your enterprise. Amazon Q Business is delivered using a built-in web experience or through APIs. This helps business users leverage the power of generative AI without any overhead.

Connects 40+ enterprise applications

Amazon Q Business has over 40 built-in connectors to popular enterprise applications and document repositories, including Amazon Simple Storage Service (Amazon S3), Salesforce, Google Drive, Microsoft 365, ServiceNow, Gmail, Slack, Atlassian, and Zendesk. This helps with faster integrations to your enterprise systems, providing a tailored response to user queries. The connectors include both cloud-based systems and on-premise systems.

Cost

Per user. 3forlite,3 for lite, 20 for pro.

Capabilities and Challenges for GenAI

Capabilities

  • Adaptability
  • Responsiveness
  • Simplicity
  • Creativity and exploration
  • Data efficiency
  • Personalization
  • Scalability

Challenges

  • Regulatory violations
  • Social risks
  • Data security and privacy concerns
  • Toxicity
  • Hallucinations
  • Interpretability
  • Nondeterminism

Business use case

Structured narrative describing system behavior from stakeholder perspective.

Parts of a use case:

  • Use case name: Short, descriptive identifier
  • Brief description: High-level summary of purpose
  • Actors: Entities interacting with system (human or external)
  • Preconditions: Required conditions before initiation
  • Basic flow: Step-by-step description of successful scenario
  • Alternative flows: Scenarios for exceptional conditions
  • Postconditions: Required state after completion
  • Business rules: Governing policies and constraints
  • Nonfunctional requirements: Performance, security, usability considerations
  • Assumptions: Context necessary for validity
  • Notes: Additional helpful information

Key Aspects of Prompt Engineering

  • Design: Creating precise, clear, and contextually appropriate prompts that clearly convey the intended task or desired output to the model

  • Augmentation: Enhancing prompts with supplementary information or parameters, including examples, demonstrations, or specific guidelines, to steer the model’s response generation

  • Tuning: Systematically improving and modifying prompts based on model performance and output quality, using human feedback or quantitative metrics

  • Ensembling: Utilizing combinations of different prompts or generation approaches to enhance reliability and overall output quality

  • Mining: Discovering and identifying highly effective prompts through methods such as prompt exploration, automated prompt creation, or selection from existing prompt collections

Prompt techniques

Prompt engineering techniques are strategies used to guide generative AI models. Some common prompt engineering techniques include the following:

  • Zero-shot prompting
  • Few-shot prompting
  • Chain-of-thought (CoT) prompting
  • Self-consistency
  • Tree of thoughts (ToT)
  • Retrieval Augmented Generation (RAG)
  • Automatic Reasoning and Tool-use (ART)
  • ReAct prompting

Fine Tuning

Need “domain-specific terminology” then “fine tuning” is the answer.

Fine-tuning is a process used to enhance the performance of a pre-trained language model by further training it on a specific task or domain-specific dataset. This allows the model to better adapt to the requirements of a particular business use case, improving its ability to understand and process information relevant to that context. Although foundation models (FMs) are already capable due to their pre-training, fine-tuning can significantly enhance their performance.

There are two main approaches to fine-tuning a model:

  1. Instruction Fine-tuning: This involves using examples to show how the model should respond to specific instructions. Prompt tuning is a subset of instruction fine-tuning, where specific prompts are used to guide the model’s responses.

  2. Reinforcement Learning from Human Feedback (RLHF): This method uses human feedback to train the model, aligning it more closely with human preferences and improving its ability to produce desired outcomes.

Evaluating Foundation Models (FM)

Types

1. Human Evaluation

  • Involves human interaction with the FM
  • Assesses performance based on specific criteria
  • Tasks: open-ended conversations, question-answering, text generation
  • Provides qualitative feedback on coherence, relevance, factuality, and overall quality
  • Considered gold standard but time-consuming and expensive

2. Benchmark Datasets

  • Curated data collections for evaluating AI systems
  • Cover various topics, complexities, and linguistic phenomena
  • Examples:
    • GLUE (General Language Understanding Evaluation)
    • SuperGLUE
    • SQuAD (Stanford Question Answering Dataset)
    • WMT (Workshop on Machine Translation)
  • Provide standardized comparison and progress tracking

3. Automated Metrics

  • Quick and scalable evaluation method
  • Measure specific aspects of model outputs:
    • Perplexity (predicting next token)
    • BLEU score (machine translation)
    • F1 score (classification or entity recognition)
  • Useful for rapid iterations and fine-tuning
  • May not capture nuances of human language

Key Considerations

  • Align model capabilities with organization’s requirements and goals
  • Balance between different evaluation methods
  • Consider limitations of each method when interpreting results

Evaluation Metrics

Metrics like ROUGE, BLEU, and BERTScore provide an initial assessment of the foundation model’s capabilities.

Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a set of metrics used for evaluating automatic summarization and machine translation systems. It measures the quality of a generated summary or translation by comparing it to one or more reference summaries or translations.

Bilingual Evaluation Understudy (BLEU) is a metric used to evaluate the quality of machine-generated text, particularly in the context of machine translation. It measures the similarity between a generated text and one or more reference translations, considering both precision and brevity.

BERTScore is a metric that evaluates the semantic similarity between a generated text and one or more reference texts. It uses pre-trained Bidirectional Encoder Representations from Transformers (BERT) models to compute contextualized embeddings for the input texts, and then calculates the cosine similarity between them.

The choice depends on what your system is designed to do:

  • If it’s a translation system, BLEU would be most appropriate.
  • If it’s generating natural language, BERTScore might be best.
  • If it’s summarizing text, ROUGE would be suitable.
  • If it’s a general language model, Perplexity would be a good choice.

Vector Embeddings

Vector embedding is a technique that transforms words, phrases, or any data into numerical vectors in a high-dimensional space, where semantic relationships are preserved as geometric relationships. These dense numerical representations capture the meaning and context of the original data, allowing machines to understand similarities between concepts by measuring the distance between vectors. When text is converted to embeddings, words with similar meanings cluster together, enabling AI systems to perform semantic search, recommendations, and other language understanding tasks. Modern embedding models like those used in large language models can represent complex linguistic nuances, contextual relationships, and even cross-modal connections between text and other media types, making them fundamental to how AI systems process and understand information.

Tokenization, Embeddings and Vectors

Tokenization

Tokenization is the process of breaking down text into smaller units called tokens.

Visual representation:

Original text: "I love machine learning!"
Tokenized: ["I", "love", "machine", "learning", "!"]
Token IDs: [101, 602, 403, 204, 22, 98]

Tokens can be words, subwords, or even characters depending on the tokenization strategy.

Embeddings

Embeddings are dense vector representations of tokens in a continuous vector space. They capture semantic meaning.

Visual representation:

Word: "cat"
Embedding: [0.2, -0.5, 0.8, 0.1, ...]
^ ^ ^ ^
| | | |
dimension in vector space

Each number represents the word’s position along a different dimension in the vector space. Similar words will have similar embeddings.

Vectors

Vectors are ordered lists of numbers used to represent data in machine learning.

Visual representation:

^
|
0.8| *
|
0.2| *
|___________>
0.2 0.8

This 2D plot shows a vector [0.2, 0.8]. In machine learning, vectors often have many more dimensions.

In the context of embeddings:

  • Each token (word) is represented by a vector
  • The vector’s values represent the word’s position in a multi-dimensional space
  • Words with similar meanings will be closer together in this space

For example:

"cat" -> [0.2, 0.8, -0.3, 0.5, ...]
"dog" -> [0.3, 0.7, -0.2, 0.6, ...]
"tree" -> [-0.1, -0.2, 0.9, 0.1, ...]

Here, “cat” and “dog” would be closer in the vector space than either is to “tree”.

These concepts work together in NLP tasks:

  1. Text is tokenized into individual units
  2. Each token is converted into an embedding (vector)
  3. These vectors are then used as inputs for various machine learning models

This allows machines to process and understand text in a mathematically meaningful way.

Embeddings vs Vectors

Embeddings are a specific use of vectors in the context of representing discrete objects (like words) in a continuous vector space. Embeddings are always vectors, but not all vectors are embeddings. Vectors can represent any kind of data, not just text or words.

Key differences:

  1. Purpose:
    • Vectors: General mathematical objects used in various contexts.
    • Embeddings: Specifically used to represent discrete objects in a continuous space.
  2. Meaning:
    • Vectors: Can represent anything (coordinates, forces, etc.).
    • Embeddings: Represent semantic relationships between objects.
  3. Context:
    • Vectors: Used broadly in mathematics and computer science.
    • Embeddings: Primarily used in natural language processing and machine learning.
  4. Learning:
    • Vectors: Can be manually defined or calculated.
    • Embeddings: Usually learned by a model during training.

Storing vectors

The core function of vector databases is to compactly store billions of high-dimensional vectors representing words and entities. Vector databases provide ultra-fast similarity searches across these billions of vectors in real time. 

The most common algorithms used to perform the similarity search are k-nearest neighbors (k-NN) or cosine similarity.

Amazon Web Services (AWS) offers the following viable vector database options:

  • Amazon OpenSearch Service (provisioned)
  • Amazon OpenSearch Serverless
  • pgvector extension in Amazon Relational Database Service (Amazon RDS) for PostgreSQL
  • pgvector extension in Amazon Aurora PostgreSQL-Compatible Edition
  • Amazon Kendra

Model Evaluation with Benchmark Dataset

0.3
nomnoml
#arrowSize: 1
#direction: down
#fill: #eeeeee;#ffffff
#fillArrows: false
#font: Calibri
#fontSize: 12
#leading: 1.2
#lineWidth: 1
#padding: 10
#spacing: 40
#stroke: #333333
#title: Nomnoml Diagram
#zoom: 1
[Questions] -> [<frame>Model to evaluate]
[Model to evaluate] -> [Generated answers]
[Generated answers] -> [Judge model]
[Questions] -> [Benchmark dataset]
[Benchmark dataset] -> [Judge model]
[Judge model] -> [Grading score]

Fine Tuning Approaches

Instruction Tuning

Instruction tuning is a method of refining a language model by exposing it to a new dataset. This dataset consists of specific instructions paired with their intended outcomes. The goal is to enhance the model’s ability to interpret and act on user commands with greater precision.

This approach is particularly beneficial for applications that require direct interaction with users, such as digital assistants and conversational AI systems.

Reinforcement Learning From Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF) is a more complex fine-tuning strategy. It begins with supervised learning, training the model to generate responses that mimic human communication. Following this initial phase, the model undergoes a reinforcement learning process. This second stage utilizes a reward model, constructed from human evaluations, to guide the model towards producing more desirable outputs.

RLHF is especially valuable for applications where aligning the model’s responses with human values and expectations is crucial, particularly in contexts dealing with sensitive or nuanced topics.

Here’s a rephrasing of the text with ### headers for each method:

Adapting Models for Specific Domains

This technique involves tailoring a model to a particular industry or field by fine-tuning it on sector-specific data. For instance, an AI for the legal sector might be trained on legal documents, while a healthcare AI would focus on medical records. This targeted approach significantly enhances the model’s ability to handle domain-specific tasks, resulting in more accurate and contextually appropriate responses.

Transfer Learning

Transfer learning is a strategy where a model’s knowledge from one task is applied to a different, but related, task. In the context of foundational models, this typically means starting with a model trained on a broad, diverse dataset and then refining it on a smaller, more focused dataset. This method efficiently leverages the general knowledge gained during initial training and applies it to more specialized areas, reducing the need for extensive additional training.

Continuous Pre-training

This approach extends a pre-trained model’s learning phase by regularly introducing new and current data. The goal is to keep the model up-to-date with the latest information, language use, trends, and discoveries. This ongoing learning process ensures that the model’s outputs remain relevant and accurate over time, adapting to the evolving nature of language and knowledge.

Data Preparation for Fine-tuning

Initial training of foundational models uses vast, diverse datasets for broad understanding. Fine-tuning, however, is more targeted:

  • Specificity: Focused datasets relevant to specific tasks or problems
  • High relevance: Data directly related to desired outputs (e.g., legal documents for legal AI)
  • Quality over quantity: Smaller, well-curated datasets can achieve significant improvements

Key steps in fine-tuning data preparation:

  1. Data curation: Rigorous selection of highly relevant data
  2. Labeling: Accurate and relevant labeling to guide model specialization
  3. Governance and compliance: Ensure adherence to industry-specific regulations
  4. Representativeness and bias checking: Avoid introducing or perpetuating biases
  5. Feedback integration: Incorporate user or expert feedback, especially for methods like RLHF

Model evaluation

Specific metrics assess the quality of language model outputs compared to human-written standards. Three common metrics are ROUGE, BLEU, and BERTScore.

ROUGE

ROUGE evaluates automatic text summarization and machine translation by counting overlapping units between generated and reference texts.

Two ways to use ROUGE:

• ROUGE-N: Measures n-gram overlap between the generated text and the reference texts. • ROUGE-L: Uses longest common subsequence between the generated text and the reference texts.

ROUGE is simple, interpretable, and correlates well with human judgment, especially for summary recall.

BLEU

BLEU evaluates machine translation quality by comparing to human translations. It measures N-gram precision and applies a brevity penalty.

BLEU focuses on precision, evaluating at sentence level using various n-grams. It’s popular for its ease of use but has limitations in assessing fluency and grammar.

Unlike ROUGE, which focuses on recall, BLEU is fundamentally a precision metric. It checks how many words or phrases in the machine translation appear in the reference translations.

The BERTScore

BERTScore uses pre-trained contextual embeddings to evaluate text generation. It computes cosine similarity between embeddings of words in candidate and reference texts.

BERTScore captures semantic similarity, making it less sensitive to minor paraphrasing. It’s often used alongside BLEU and ROUGE for comprehensive assessment.

What is n-gram?

An n-gram is a contiguous sequence of n items from a given sample of text or speech. In the context of natural language processing and text analysis, these items are typically words, but they can also be characters or other linguistic units. Here’s a brief explanation:

Types of n-grams:

  • Unigram (1-gram): Single word
  • Bigram (2-gram): Two consecutive words
  • Trigram (3-gram): Three consecutive words
  • And so on…

Example:

For the sentence “The cat sat on the mat”:

  • Unigrams: “The”, “cat”, “sat”, “on”, “the”, “mat”
  • Bigrams: “The cat”, “cat sat”, “sat on”, “on the”, “the mat”
  • Trigrams: “The cat sat”, “cat sat on”, “sat on the”, “on the mat”

Use in NLP:

  • Language modeling
  • Text classification
  • Machine translation
  • Speech recognition

Importance:

N-grams capture local word order and are useful for understanding context and predicting next words in a sequence.

N-grams are fundamental in many natural language processing tasks and metrics, including those used in model evaluation like BLEU and ROUGE.

Security, Compliance, and Governance for AI Solutions

Concepts of security, governance, and compliance in organizations

  • Security: Ensure that confidentiality, integrity, and availability are maintained for organizational data and information assets and infrastructure. This function is often called information security or cybersecurity in an organization.

  • Governance: Ensure that an organization can add value and manage risk in the operation of business.

  • Compliance: Ensure normative adherence to requirements across the functions of an organization.

Security Standards

National Institute of Standards and Technology (NIST)

The NIST 800-53 security controls are commonly used for U.S. federal information systems. Federal information systems typically need to undergo a formal evaluation and approval process to verify they have adequate safeguards in place to protect the confidentiality, integrity, and availability of the information and information systems.

For more information, see National Institute of Standards and Technology (NIST)(opens in a new tab).

European Union Agency for Cybersecurity (ENISA)

European Union Agency for Cybersecurity (ENISA) contributes to the EU’s cyber policy. It boosts trust in digital products, services, and processes by drafting cybersecurity certification schemes. It cooperates with EU countries and bodies and helps prepare for future cyber challenges. 

For more information, see Operational Best Practices for ENISA Cybersecurity Guide for SMEs(opens in a new tab).

International Organization for Standardization (ISO)

ISO is a security standard that outlines recommended security management practices and comprehensive security controls, based on the guidance provided in the ISO/IEC 27002 best practice document.

For more information, see the AWS compliance page for ISO(opens in a new tab).

AWS System and Organization Controls (SOC)

The AWS System and Organization Controls (SOC) Reports are independent assessments conducted by third parties that show how AWS has implemented and maintained key compliance controls and objectives.

For more information, see the AWS compliance page for SOC(opens in a new tab).

Health Insurance Portability and Accountability Act (HIPAA)

AWS empowers covered entities and their business associates under the U.S. HIPAA regulations to use the secure AWS environment for processing, maintaining, and storing protected health information.

For information on how to use AWS for the processing and storage of health-related data, see the whitepaper Architecting for HIPAA Security and Compliance on Amazon Web Services(opens in a new tab).

General Data Protection Regulation (GDPR)

The European Union’s GDPR safeguards the fundamental right of EU citizens to privacy and the protection of their personal information. The GDPR establishes stringent requirements that raise and unify the standards for data protection, security, and compliance across the EU.

For more information, see General Data Protection Regulation (GDPR) Center(opens in a new tab).

Payment Card Industry Data Security Standard (PCI DSS)

The PCI DSS is a private information security standard that is managed by the PCI Security Standards Council. This council was established by a group of major credit card companies, including American Express, Discover Financial Services, JCB International, Mastercard, and Visa.

For more information, see the AWS compliance page for PCI DSS(opens in a new tab).

AI standards compliance

There are several key ways in which AI standards compliance differs from traditional software and technology requirements. The following are some issues to consider.

Complexity and opacity AI systems, especially LLMs and generative AI, can be complex with opaque decision-making. This makes auditing and understanding outputs challenging for compliance.

Dynamism and adaptability AI systems often adapt over time, even after deployment. This makes applying static standards and mandates difficult.

Emergent capabilities Unexpected capabilities can arise from complex interactions within AI systems. This requires ongoing monitoring as systems become more advanced.

Unique risks AI poses novel risks like algorithmic bias, privacy violations, misinformation, and job displacement. Traditional requirements may not adequately address these.

Algorithmic bias Systematic errors or unfair prejudices in AI outputs, often due to biased training data or human biases in development.

Algorithm accountability The principle that algorithms should be transparent, explainable, and subject to oversight. Laws like the EU’s AI Act aim to ensure AI respects human rights and promotes fairness.

Regulated workloads

Regulated is a common term used to indicate that a workload might need special consideration, because of some form of compliance that must be achieved.

This term often refers to customers who work in industries with high degrees of regulatory compliance requirements or high industrial demands. 

Some example industries are as follows:

  • Financial services
  • Healthcare
  • Aerospace

AWS Services for Governance and Compliance

AWS Config Provides detailed view of AWS resource configurations, relationships, and changes over time. Helps with resource administration, auditing, compliance, and change management.

Amazon Inspector Continuously scans AWS workloads for software vulnerabilities and network exposure. Creates findings for discovered issues and provides risk scores.

AWS Audit Manager Automates evidence collection for auditing AWS usage. Helps assess controls, manage stakeholder reviews, and streamline risk and compliance management.

AWS Artifact Offers on-demand downloads of AWS security and compliance documents like ISO certifications and SOC reports for audit purposes.

AWS CloudTrail Records AWS account activity as events for auditing, governance, and compliance. Enables analysis and response to account actions across AWS infrastructure.

AWS Trusted Advisor Evaluates AWS environments using best practice checks across various categories. Recommends actions to optimize costs, improve security, performance, and operational excellence.

Data management concepts for AI workloads

  1. Data lifecycles: Management of data from creation to disposal, including collection, processing, storage, consumption, and archiving.

  2. Data logging: Systematic recording of AI workload processing, including inputs, outputs, performance metrics, and system events.

  3. Data residency: Physical location of data storage and processing, considering compliance, sovereignty, and proximity to compute resources.

  4. Data monitoring: Ongoing observation of data quality, anomalies, and data drift to ensure relevance and representativeness.

  5. Data analysis: Methods to understand data characteristics and patterns, including statistical analysis, visualization, and exploratory data analysis (EDA).

  6. Data retention: Policies defining data storage duration, influenced by regulations, historical data needs for model retraining, and storage costs.

The OWASP Top 10 for LLMs

The Open Web Application Security Project (OWASP) Top 10 is the industry standard list of the top 10 vulnerabilities that can impact a generative AI LLM system. These vulnerabilities are as follows:

  • Prompt injection: Malicious user inputs that can manipulate the behavior of a language model

  • Insecure output handling: Failure to properly sanitize or validate model outputs, leading to security vulnerabilities

  • Training data poisoning: Introducing malicious data into a model’s training set, causing it to learn harmful behaviors

  • Model denial of service: Techniques that exploit vulnerabilities in a model’s architecture to disrupt its availability

  • Supply chain vulnerabilities: Weaknesses in the software, hardware, or services used to build or deploy a model

  • Sensitive information disclosure: Leakage of sensitive data through model outputs or other unintended channels

  • Insecure plugin design: Flaws in the design or implementation of optional model components that can be exploited

  • Excessive agency: Granting a model too much autonomy or capability, leading to unintended and potentially harmful actions

  • Overreliance: Over-dependence on a model’s capabilities, leading to over-trust and failure to properly audit its outputs

  • Model theft: Unauthorized access or copying of a model’s parameters or architecture, allowing for its reuse or misuse

Data lineage

Data lineage in machine learning refers to the tracking and documentation of data’s journey throughout its lifecycle in an ML system. Here are the key points about data lineage in ML:

  1. Definition: Data lineage tracks the origin, movement, transformations, and destinations of data within ML pipelines and systems.

  2. Purpose:

    • Provides visibility into data flow and transformations
    • Enables tracing errors back to their root cause
    • Supports reproducibility of ML experiments and models
    • Facilitates debugging and auditing of ML processes
  3. What is tracked:

    • Code versions (e.g., Git commit hashes)
    • Configurations and hyperparameters
    • Input data sources and versions
    • Model versions
    • Transformations applied to data
    • Resources used (e.g., GPU types, CPU count)
    • Ownership information
  4. Benefits:

    • Improves experiment tracking and bookkeeping
    • Enhances reproducibility of ML experiments
    • Supports debugging and error tracing
    • Enables compliance and auditing in regulated industries
    • Facilitates model explainability and transparency
  5. Visualization: Data lineage is often represented visually, showing the flow of data from source to destination, including transformations and intermediate steps.

  6. Tools: Various ML platforms and tools offer data lineage tracking capabilities, such as Neptune, Sematic, and others.

  7. Relation to data provenance: Data lineage is considered a subset of data provenance, which tracks not only the data itself but also the systems and processes influencing the data.

  8. Implementation: Data lineage tracking should ideally be automated by ML platforms rather than manually maintained, due to the complexity and volume of information involved.

  9. Scope: The granularity of data lineage can vary from high-level system interactions to detailed tracking of individual data points and their transformations.