Guidelines for Data Point Modeling
From XBRLWiki
Revision as of 05:22, 8 October 2013 (edit) Anna-Maria.Weber (Talk | contribs) ← Previous diff |
Revision as of 05:40, 8 October 2013 (edit) Anna-Maria.Weber (Talk | contribs) Next diff → |
Revision as of 05:40, 8 October 2013
Contents
|
Introduction
General
The purpose of this document is to support supervisory experts in the creation of a Data Point Model (DPM). By definition of the European Banking Authority (EBA) a DPM “is a structured formal representation of the data [...] identifying all the business concepts and its relations, as well as validation rules, oriented to all kind of implementers.”1 The underlying rules for the creation of such methods were initially introduced by the Eurofiling Initiative and developed further by the European Insurance and Occupational Pensions Authority (EIOPA). The main objective of data point modelling, the process of creating a DPM; “[it] should help to produce a better understanding of the legal background to the prudential reporting data and make data analysis much easier for both the institutions and regulators”2. Further goals are to prevent redundancies, lower maintenance efforts and, in general, to facilitate working with national extensions on the European agreed upon data set to facilitate the descriptions of requirements that are sharable across national legislations. It is a requirement to have all the information collected by the national supervisory agencies, particularly in Europe, transformed into the same data structure with the same quality in order to be able to carry out standardized analysis of the data across Europe. The current implementations are not able to meet these European requirements for supervision “to achieve higher quality and better comparability of data”3. The main reasons for this are the differences between the data definitions and the data formats of the various national supervisory agencies, making comparison of reported data virtually impossible.
Objective
The aim to harmonise the European supervisory reporting is to be able to carry out more comprehensive analysis and an increase of comparability of data. The supervisory agencies are already acquainted with the representation of regulations specified in laws, this document is going to introduce the reader to the concept of Data Point modelling methodology as well as to its main terms and definitions that will enable you to create Data Point Models that contain “all the relevant technical specifications necessary for developing an IT reporting format” on your own.
Target audience
In general you as banking supervisor are responsible to communicate with Information Technology (IT) experts in order to support the transfer of the essence of regulatory reporting to IT systems. In 2009 the Eurofiling Initiative has published the concept of Data Point modelling. Structures of data represented in supervisory tables as well as underlying laws and guidelines were defined in order to enable the interpretation of the reporting information by IT applications. IT specialists are responsible for the development of software, however most of the time they do not have the special business knowledge needed to gather reporting requirements from various sources such as legal texts like Solvency Regulations and National Banking Acts for building a faultless system. Therefore the task of creating a DPM is assigned to you. This document introduces basic principles deemed necessary in the modelling process. On the basis of the explanations given in this document you will be able to provide prerequisites for deriving data formats on the basis of a DPM as well as setting up a powerful data warehouse. This implies that the model is issued in a format that is understood by both parties, involved in transforming legislation into a model: business experts and IT specialists. The topics regarding supervisory reporting are kept short and limited to the content relevant for this paper. The idea is to convey the creation of the Data Point Model to you, as you are a supervisor with analytical capabilities and personal interest in this topic. No special IT knowledge is expected. The first sections will give you an overview on the required IT knowledge. National banking supervisors have a mandate to evaluate the financial situation of financial institutions in their country. To be able to perform the necessary analytics, financial data is required from these institutions. The requirements are described in the form of texts and tables of data. To make a comprehensive model from these texts and tables a model is being created to enable IT support in communicating and storing the necessary data. A common problem with the NSA's is that IT staff has little financial background and financial specialists have little IT background. This makes data modelling a problematic area as both specialities are needed. This document is aimed at providing the tools and knowledge of creating a DPM by the financial specialists. The result, a model, can later in the process be perfected by IT staff.
Scope
This paper is a handbook for supervising experts. The main body consists of four sections. The interrogative form helps in choosing which section promises most answers to your problem. After this first introductory section the main part starts to provide basic knowledge about different types of data models and data modelling approaches. The second and third section provide an overview of data models in general in contrast to the fourth section that highlights the necessity of data modelling for supervisory data. This fourth section derives the objectives based on the background information of the preceding sections. Furthermore one paragraph classifies the Data Point Model introduced by the Eurofiling Initiative and elaborated by EIOPA and EBA where many new terms related to DPM are introduced. A paragraph, which explains the areas of application for the DPM follows. The fourth section concludes with a paragraph introducing a subset of the technical constrains that need to be considered in the creation process of the DPM. The fifth section gives step by step instructions to create a DPM. The paper concludes with remarks on the progress achieved so far and provides an outlook on the software that is being developed at the moment to support you during the creation process. The last section also evaluates the DPM process to more traditional approaches. New terms are introduced throughout the text when they come up for the first time and can additionally be looked up in the glossary, which can be found in the appendix at the end of the paper.
Terms and definitions
For the purposes of this document, the following terms and definitions apply.
NOTE The terms definitions used in connection with Data Point modelling are inspired by vocabulary already known through their use for describing multidimensional databases and data warehouses. IT specialists originally introduced these terms. However, for an understanding and creation of Data Point Models they are now established in the language of business specialists as well.
data point
a Data Point can be compared to a cell in a table that holds reportable information and the row- and columnheaders characterising the Data Point can be regarded as the dimension and member combinations that apply to the Data Point
default member
a member in an enumerable dimension that will represent the dimension-member combination on a Data Point when that dimension is not explicitly associated
dictionary element
an abstract term for dimensioned elements, dimensions, domains and members
dimension
a dimension represents the “by” condition of a Data Point
Note 1 to entry: Dimensions literally describe the dimensioned element in order to limit the range of interpretation and thereby qualify the dimensioned element. One dimension either has a definite (i.e. countable) number of members, which is called an explicit dimension, or an infinite number of members represented as values, that follow a specific typing pattern, which is known as a typed dimension.
dimensioned element
a dimensioned element shows the nature of the data by typing it. It holds information about the underlying structure of the cell that is specified. In IT contexts a dimensioned element is referred to as metadata
domain
a domain is a classification system to categorize items that share a common semantic identity
Note 1 to entry: A Domain provides therefore an unambiguous collection of items in a value range. The items of a Domain can have a definite, and therefore countable, number of items, or an infinite number of elements that follow a specific (syntax) pattern.
domain member
each element that is part of a domain is called a domain member
Note 1 to entry: It is also possible to have members that do not belong to a domain; they can refer to a dimension directly.
Note 2 to entry: Domain members can either be explicitly named or defined by a type.
enumerable dimension
an enumerable dimension is a dimension that “specifies a finite number of members
fact
a fact describes the quantitative aspects of data reported
EXAMPLE An amount, a number, a string of text, a date.
hierarchy
a non-enumerable dimension “specifies an undefined number of [members] [...] [it] defines syntactic constraints on the values of the members, i.e. a data type or a specific pattern
non enumerable dimension
a non-enumerable dimension “specifies an undefined number of [members] [...] [it] defines syntactic constraints on the values of the members, i.e. a data type or a specific pattern
sub-domain
a sub-domain is a subset of the members of a domain
taxonomy
a taxonomy describes a valid Data Point Model
templates
graphical representation of a set of supervisory data
What is a data model
Introduction
Data models outline the relationships between data. It is important that the person responsible for modelling takes time to capture all relations between data that can be shown in the model. It is essential that the model is reviewed by third parties involved. Thereby errors can be identified in advance. Furthermore it helps to get a clearly structured model that can save time and costs later.
The term “model”
The term model has its origin in the French noun “modelle”. In IT context a model pictures a target-oriented system instead of directly intervening in the complex system. Specifically in terms of data models this means a real system, a system from the domain comprised of real components that are tangible and dynamic, is mapped to a model to reduce complexity. This may help to find a suitable solution to an existing problem. The model needs to be created as close to reality as possible with attention to requirements regarding structure and behaviour. Nevertheless, in order to raise the comprehensibility, aspects irrelevant for the purpose of modelling may be left out. The importance of a single aspect and whether it is worth being specified in the model is depending on the decision of the domain experts. This strongly depends on the modeller’s understanding, creativity and capability to associate the object system with the model. The challenge of data modelling is that a data model “must be simple enough to communicate [it] to the end user [...] [and] [...] detailed enough for the database design to use to create the physical structure“. The same principle applies to message design and its physical representation. In the following paragraph the procedure of data-oriented modelling is presented.
Data-oriented process of modelling
The data-oriented process focuses on describing the static structure of the reporting system in contrast to the function-oriented process, which begins with modelling the functions of the reporting system and adds the data in a later stage. As data is the focuspoint of the banking supervisors the data-oriented process is applied. Additionally, in the course of time, data [objects] do not change as much as functions do. Functions are not being taken into account here. Applying the data oriented process, data objects are specified first as well as the attributes that belong to each data object. The next step is to put the objects in relation to each other. Furthermore the data model can imply integrity conditions and define operations that can be carried out on the data.
The conceptual data model as a first step aiming for a database system
The data-oriented modelling takes place on 3 different levels that are built upon one another.
Picture
The conceptual data model reflects your reporting requirements. You are in the best position to know what pieces of information are requested. The conceptual model helps you in the communication with your IT specialists. This is an important step to avoid unpleasant surprises later when the model is implemented in the IT department. The model is built regardless of the database system or data warehouse to be used. Relevant facts of the object system are to be specified without loss of information. However, you, as the creators of the conceptual model do not need to be technically skilled as the succeeding steps of data modelling are carried out by IT specialists. They should be concerned about the technical requirements. It is very important that this first step of preparing the conceptual data model is carefully elaborated before transferring the information to the IT. This can be ensured by early reviews, which include all parties concerned. The logical data model as well as the physical data model are prepared by the IT specialists. In essence, the logical data model immediately follows the conceptual model (see Figure 1). When aimed at a database approach in contrast to the conceptual model it also takes the requirements of the database or the data warehouse into account. The physical data model as a final step describes the actual implementation into an existing database system.
Description of data modelling approaches for supervisory purposes
Introduction
This paragraph deals with the methods that are used to disseminate data and identify all of its appropriate aspects. The two most appropriate methods of expressing regulatory data in a structure to determine the context this information is associated with, will be discussed here. Both modelling approaches refer to metadata.
Definitions for data and metadata are given below:
Data is “information processed or stored by a computer. This information may be in the form of text documents, images, audio clips, software programs, or other types of data. Computer data may be processed by the computer's CPU and is stored in files and folders on the computer's hard disk.”
Metadata “describes data. It provides information about a certain item's content.“
While data is a number like “50” the metadata adds qualifying information to the number. The explanation on the “form centric” and the “data centric” modelling approaches will clarify the difference.
Using the “form centric” modelling approach
The “form centric” approach is an ordinary table format with information held in a cell of a predefined table called a template. Here a template is understood as a graphical representation of a set of supervisory data. This approach identifies reporting data by their position in the templates. In this case each datum is defined by its coordinate in the table that is represented by the combination of columns and rows of a template. Each coordinate has a code that is based on the row code and the column code. This means that the data reported on basis of coordinate codes is meaningless without the context of the template. In the following example, each cell that represents a data requirement is described by a code combination of its column and its row of the table Market Risk: Standardised form for position risk in equities (MKR SA EQU) of the COREP framework. The form represents market risk equity positions of the institutions that are subject to mandatory reporting. Throughout the whole document this table serves as an example to introduce terms and concepts of Data Point modelling to you. The table with annotations can be found in the appendix in full size in order to deliver better clarity.
Picture
The “form centric” approach is oriented at the visualization of the data. Dependencies between the codes of the data are only shown in the templates, i.e. by identifying the appropriate headlines or by the indents of the label rows. A report based on the “form centric” approach, which uses codes for the identification of data, is not able to incorporate the dependencies visible.
Picture
On the basis of the section of sample table MKR SA EQU shown in Figure 3 the “form centric” approach is explained. The value reported by the monetary institution in each cell is called a fact. Facts are classified as data. Let us say the oval circled cell defined by the row position r021 and the column position c010 holds the monetary value 50. The coordinate code r021c010 in the red circle is the combination of the row position followed by the column position. Taking the template into account we realise the number “50” represents a value for derivatives as a gross position. When we include additionally the headline above column c010 we can conclude that a long-term position is reported. Looking at the excerpt it is not specified to which year this information belongs. Neither do we know whether 50 represents a value in thousands or millions nor can we conclude its currency. We can imagine that it would be really hard for a non-supervisor to correctly classify this information 50. Now if you think about the table shown in Figure 3 again, what would that numbers tell you if you would not have any headlines labelling the rows and the columns? Obviously the information would be useless. As a conclusion we see that the “form centric” approach doesn’t include information about the data reported, which is assumed to be known (like all figures are in thousands). Moreover without the context of the row and column position of the datum the information content is essentially zero.
Using the “data centric” modelling approach
In the “data centric” approach, data is identified by a set of characteristics. It is considered independently of its graphical representation by adding information that unambiguously defines the datum. Therefore no positional alignment is needed in order to give the datum a specific meaning. Any datum is expressed in terms of the categories necessary for their identification.
Information available is divided into two groups:
qualifying information; quantifying information. Qualifying information is represented by attributes to certain categories while quantifying information describes the object evaluated.
Figure 5 shows a dimensioned element which holds the information about the main character of the datum to be reported. A dimensioned element shows the nature of the data. It holds information about the underlying structure of the cell that is specified. In IT contexts a dimensioned element is referred to as metadata. In our example the dimensioned element specifies the amount type of the datum as a gross value. The corresponding categories called dimensions contain further information on the datum and therefore increase the quality of the datum to be reported. The dimensioned element as well as the dimensions belongs to the group of qualifying information, i.e. metadata. The number itself, “50” in our example, is called a fact and represents the quantifying information of the datum.
Picture
One Data Point is represented by one cell of the table in the “form centric” approach. Going back to the example above used to explain the “form centric” approach defining the cell by a combination of row and column codes (like r021c010) we have got a Data Point specified by a dimensioned element with its corresponding dimensions. One possible dimension for example that can be derived looking at the table in Figure 3 is the risk type dimension. Various types of risk are listed in the rows of this table. “general risk” and “specific risk” are reasonable attributes for the risk type dimension. To identify the risk types business knowledge is needed. We cannot rely on the nesting (tabs) in the table as they might be used differently amongst table creators for presentation purposes. Each dimensioned element is characterised by a variable number of dimensions. Each dimension is linked to one attribute, called a member, to characterise the Data Point. The dimensions represent the “by” conditions. Dimensions literally describe the dimensioned elements in order to limit the range of interpretation and thereby qualify a dimensioned element. One dimension either has a definite (i.e. countable) number of elements, which is called an enumerable dimension, or an unknown list of members to the regulator, which is called a non enumerable dimension. Members are attributes that can be assigned to a dimension. As members are often used for various dimensions, domains are introduced in order to reduce redundancy. Each domain contains semantically correlated members that can be used throughout the whole of the reporting framework. The dimension represents the semantic relevance for the specific use on the dimensioned element. All members are added to at least one domain that can be reused by a variety of dimensions. Returning to the difference between metadata and data, the definitions are transferred to the vivid example of MKR SA EQU. The Data Point identified by the row and column code combination r021c010 in the table format holding a fact “50”, which can be referred to as data. The metadata is described by the dimensioned element specifying 50 to be a gross value and the selected domains, one for each applied dimension. It should be ensured that each Data Point is defined only once in a reporting framework, regardless of whether it is included in more than one table. One major benefit is that the information can be assembled in various ways based on the preference of the supervisory expert. Therefore the form of the tables can be aligned with the previously used “form centric” tables. This results in a minimum adaptation time for the filers.