Data Storage Techniques and Options
Once you have a basic understanding of the various components of a data warehouse, it is crucial to familiarize yourself with the available storage options and techniques used to store data at different stages of data processing and consumption. There are several factors to consider when choosing a storage technique or option, including the amount of data, the frequency of data updates, and the query performance requirements.
Popular storage techniques and options for data warehouses include the following:
• Relational databases: These are the most common storage systems for data warehouses. They are efficient at handling large volumes of structured data and can support complex queries. Examples include MySQL, Oracle, and Microsoft SQL Server.
• Columnar databases: These databases store data in columns rather than rows, which can improve query performance for large datasets. Examples include Amazon Redshift, Google Big Query, Apache Cassandra, SAP HANA, and Vertica.
• In-memory databases: These databases store data in memory, which can dramatically improve query performance, but at a higher cost. Examples include SAP HANA, Oracle TimesTen, VoltDB, Microsoft SQL Server In-Memory OLTP, and MemSQL.
• NoSQL databases: These databases are designed to handle unstructured or semi-structured data, which can be useful for data warehouses that need to incorporate a variety of data types. Examples include MongoDB, Cassandra, and Neo4j.
• Cloud-based storage: Cloud-based data warehouses can be a cost-effective and scalable option, as they allow organizations to store and process large volumes of data without investing in expensive hardware.
• Data lakes: These are large repositories of raw, unstructured data that can be used for a variety of purposes, including data exploration, machine learning, and analytics. Examples include Amazon S3, Google Cloud Storage, and Azure Data Lake Storage. In most cases the underlying technology is object storage, which is optimized for storing unstructured data such as images, videos, and audio files. Examples include Amazon S3 and Google Cloud Storage.
Overall, the choice of schema and storage options in a data warehouse will depend on the specific needs and goals of the organization.