Data in the broad sense consist of metrics and measures of physical and abstract entities that are important to the business. In developing a data architecture for the enterprise it is common to use a layered structure as follows:
- A set of standards, principles and definitions of concepts, entities and their relationships
- A conceptual model of the data
- A logical model of the entities (including a list of their data attributes) and their relationships
- A physical model for implementing and maintaining the data (technical/solution architecture for data persistence and provision)
- A technical specification of its implementation in systems (database designs/storage and retrieval designs)
A data architecture commonly incorporates layers 1 to 3, leaving 4 and 5 to the technical/solution architecture practice.
Standards and principles might comprise the following:
- Quality assurance - which data are quality checked at ingestion before being persisted? For instance: checking it is valid with respect to data type (a date is a date, etc.) and complies with business rules (e.g. policy holder’s age must be between 21 and 75; parts must always have a reference number)
- Managing currency and validity - all entities have a master data source, and only this can be modified. Copies of entity data can be implemented elsewhere but must be immutable and refreshed/replaced within a prescribed period after the source has been modified. The prescribed period may vary with use cases but decisions regarding this must be carefully made in order to drive the most benefit from all use cases while maintaining flexibility for future use cases - if the data are a month old it will have no use in an emerging use case that needs it to be no more than 24 hours old, unless its management is re-engineered to adapt it for the new use case
- Managing optimisation and usage - operational data must be separated from data used for analysis and reporting purposes. It is likely that physical demands on the provision and maintenance of the data will be different for each use case leading to conflicts between the two. For example, operational data from systems of record could be copied to a separate repository for reporting and analysis use so that decisions or use in reporting cannot delay use or impose unwelcome repercussions in operations
- Managing regulatory and policy compliance - data privacy regulations and data security policies will impose mandatory requirements and standards that will need to be accommodated in descriptions and designs, both conceptual and logical, in the data architecture
- Stewardship and accountability - each data domain should have an ‘owner’ who is responsible and accountable for ensuring that principles, standards, and regulations are upheld, that changes are properly planned and authorised, and that risks are appropriately mitigated
Conceptual data models deal with the main entities and their relationships. In the case of the enterprise this may include, offices/locations, employees, customers, orders, products, shipments, etc. These will have relationships and roles to play. It is common to break the model down, constructing it in functional domains such as ‘customer service’, ‘sales’, ‘Human Resources’, etc. Referring back to the functional architecture, the conceptual data domains will align with their corresponding functional/process domains.
The logical data model reaches a greater level of detail, providing the attributes that describe each entity and stipulating which of these must meet certain conditions such as being mandatory and/or unique. The model itself can also be arranged according to ‘normal forms’. This avoids redundancy in the data model and proscribes what is required to adequately attribute each entity (e.g. mandatory attributes that must be set, uniqueness, valid ranges/values). It pays special attention to the dependencies between entities ensuring that references are used between them, thus allowing data attributes to be accessible throughout the model without needing to be reproduced where they are required. For example: a person’s name is an attribute of the person entity and if it is required elsewhere, for instance on an invoice, it can be acquired using the relationship between the invoice entity and the person entity, albeit via other intervening entity relationships if necessary. Thus, a logical model holding the name in both the person entity and the invoice entity can be ‘normalised’ so that it is only an attribute of ‘person’.