Search for Well Architected Advice
Identify the data within your workload
Identifying the data within your workload is the first step in the data classification process, which helps determine the sensitivity, criticality, and legal or compliance requirements associated with that data. This identification process includes understanding the types of data processed, the business processes they support, where the data is stored, and who the data owner is. Proper data identification is essential for applying appropriate security controls, ensuring compliance, and safeguarding sensitive information.
- Understand the types of data: Begin by identifying the types of data that your workload processes. This includes personal data (PII), financial records, intellectual property, healthcare data (PHI), or any other sensitive information. Understanding the nature of the data is crucial for determining its classification and applying the necessary protection.
- Determine data location and storage: Identify where the data is stored within your workload. This could include Amazon S3 buckets, databases (e.g., Amazon RDS, DynamoDB), file systems, or other storage services. Mapping out where sensitive data resides ensures that you apply security controls, such as encryption, access control, and monitoring, to the correct locations.
- Identify business processes associated with the data: Understand which business processes are associated with the data. For example, is the data used for transaction processing, analytics, customer management, or other critical business functions? Understanding the business context helps you prioritize data protection efforts based on the potential impact of data compromise.
- Determine data ownership: Assign data ownership to specific individuals or teams within the organization. Data owners are responsible for ensuring that the correct security, retention, and access controls are applied to their data. Identifying data owners is essential for governance and accountability.
- Consider legal and compliance requirements: Identify any legal or regulatory requirements that apply to the data in your workload. Depending on the type of data and the regions in which your business operates, you may need to comply with regulations such as GDPR, HIPAA, CCPA, or industry-specific standards like PCI DSS. Understanding these requirements helps guide your data protection and retention strategies.
- Define applicable data controls: Based on the type, location, and business context of the data, define the necessary data controls. These may include encryption (both in transit and at rest), access controls, audit logging, and retention policies. Ensuring that appropriate controls are applied from the start of the data classification process helps safeguard the data throughout its lifecycle.
- Prepare for data classification: After identifying all relevant data within your workload, proceed to classify the data based on its criticality and sensitivity. Classification helps you categorize data and apply the correct protection measures, such as encryption, access restrictions, and monitoring.
Supporting Questions:
- What types of data does your workload process, and how sensitive is that data?
- Where is your data stored, and what security controls are in place to protect it?
- Who are the data owners, and how are they responsible for enforcing data controls?
Roles and Responsibilities:
Data Owner:
- Responsibilities:
- Identify and document the types of data under their control.
- Ensure that the appropriate data controls, such as encryption and access policies, are applied.
- Collaborate with legal and compliance teams to ensure the data meets regulatory requirements.
Cloud Administrator:
- Responsibilities:
- Map the data storage locations within AWS services (e.g., Amazon S3, Amazon RDS).
- Apply technical controls, such as encryption and access policies, to protect sensitive data.
- Monitor data access and implement logging for compliance and security audits.
Artefacts:
- Data Inventory: A detailed list of the data types processed in the workload, their storage locations, and the associated business processes.
- Data Ownership Documentation: Records of the individuals or teams responsible for managing specific data sets, along with their responsibilities.
- Legal and Compliance Requirements Checklist: Documentation outlining the applicable legal and regulatory requirements that affect the data in the workload.
Relevant AWS Services:
AWS Data Management and Security Services:
- Amazon S3: Stores data with configurable access controls and encryption options, allowing you to securely manage and protect sensitive data.
- AWS Key Management Service (KMS): Manages encryption keys used to protect sensitive data stored in AWS services, ensuring compliance with encryption policies.
- Amazon Macie: Helps you identify sensitive data (such as PII) stored in Amazon S3 buckets and provides insights into data classification and protection.
- AWS Identity and Access Management (IAM): Allows you to define and enforce access controls for data within your workload, ensuring only authorized users have access.
- AWS CloudTrail: Provides a record of data access and changes to resources, allowing you to audit and monitor compliance with data protection policies.
Compliance and Monitoring Services:
- AWS Config: Continuously monitors and records configuration changes to your data resources, ensuring compliance with security and retention policies.
- AWS Security Hub: Aggregates security findings across AWS services and provides insights into data protection and compliance with regulatory requirements.