Search for Well Architected Advice
Remove unneeded or redundant data
ID: SUS_SUS4_1_old
Efficient data management is critical for sustainability, as it reduces the environmental impact of data storage. By systematically identifying and removing unneeded or redundant data, organizations can minimize storage consumption and decrease associated operational costs.
Best Practices
Implement a Data Classification Framework
- Conduct a data inventory to identify and categorize data based on its importance to business outcomes.
- Establish classification criteria that determine the sensitivity, criticality, and storage requirements of different data types.
- Utilize automated tools to classify data at scale, ensuring accuracy and efficiency in the classification process.
- Regularly review and update classification policies to accommodate changes in business needs or data relevance.
- Choose appropriate storage tiers for classified data, leveraging energy-efficient options for low-criticality data, thereby reducing energy consumption and costs.
- Implement lifecycle policies to automatically transition data to less performant storage solutions as its criticality decreases or as it ages, optimizing resource use.
- Educate stakeholders on the importance of data classification and its role in sustainability goals, fostering a culture of data stewardship.
Questions to ask your team
- What criteria do you use to classify data in your organization?
- How often is the data classification policy reviewed and updated?
- Are all teams aware of and trained on the data classification policy?
- How do you ensure that the right energy-efficient storage tier is chosen based on data classification?
- What processes are in place for moving data to lower-cost storage solutions as its criticality decreases?
- How do you handle data that is determined to be no longer required?
Who should be doing this?
Data Governance Officer
- Establish and maintain the data classification policy.
- Ensure alignment of the data classification policy with sustainability goals.
- Oversee data management practices to reduce storage footprint and resource utilization.
Data Architect
- Design data models that support data classification and efficient storage options.
- Recommend storage solutions that align with the classification of data.
- Implement systems to transition data to appropriate storage tiers based on usage patterns.
Compliance Officer
- Verify that data management practices comply with regulatory requirements.
- Monitor adherence to the data classification policy across the organization.
- Conduct regular audits to assess the effectiveness of data lifecycle management.
IT Operations Manager
- Manage the operational aspects of data storage and lifecycle policies.
- Ensure resources are provisioned efficiently based on classified data requirements.
- Facilitate the deletion of obsolete data in line with organizational policies.
Business Analyst
- Assess the criticality of data to business outcomes.
- Work with stakeholders to understand data usage and inform classification efforts.
- Evaluate the impact of data management practices on sustainability goals.
What evidence shows this is happening in your organization?
- : A comprehensive template for creating a data classification policy that defines criteria for classifying data based on its importance to business functions and its sustainability implications. This template includes sections for data categories, responsibilities, and guidelines for choosing appropriate storage solutions.
- : A report that outlines current data management practices within the organization, highlighting areas where improvements can be made to support sustainability goals. This includes an analysis of data classification practices, storage tiers in use, and opportunities for lifecycle management.
- : A checklist to evaluate the efficiency of data storage solutions, encouraging teams to assess and adopt energy-efficient storage options based on data classification.
- : A strategic plan that outlines steps for managing data throughout its lifecycle, including policies for transitioning to less performant storage as data requirements decrease and guidelines for safely deleting unnecessary data.
- : A visual matrix that maps different types of data to corresponding storage solutions, highlighting energy efficiency and performance considerations according to the classification level of the data.
Cloud Services
AWS
- Amazon S3 Intelligent-Tiering: Automates data storage costs by moving data between two access tiers when access patterns change, making it easier to manage costs related to data storage.
- AWS Data Lifecycle Manager: Allows you to automate the creation, retention, and deletion of EBS volumes and snapshots, helping to manage data over its lifecycle efficiently.
- AWS Glue: A fully managed ETL service that can help you discover, categorize, and organize your data to facilitate efficient data classification and management.
Azure
- Azure Blob Storage: Offers different storage tiers (hot, cool, archive) to help optimize storage costs based on data accessibility and lifecycle, supporting classification efforts.
- Azure Data Lake Storage: Enables the classification and organization of data in a scalable environment, supporting efficient data lifecycle management policies.
- Azure Policy: Helps enforce governance on your Azure resources, allowing you to establish and apply data classification policies across your resources.
Google Cloud Platform
- Google Cloud Storage: Provides multiple storage classes (Standard, Nearline, Coldline, and Archive) that help in optimizing data storage costs based on access frequency and data lifecycle.
- BigQuery Data Transfer Service: Automates data movement and allows for better insights into data usage patterns, which can inform classification and lifecycle management strategies.
- Google Cloud Data Catalog: A fully managed data governance tool that allows for the discovery and classification of data across Google Cloud services.