LargeInstanceProcessing
From XBRLWiki
Revision as of 11:24, 15 April 2014 (edit) Eric.jarry (Talk | contribs) (→Reporting errors) ← Previous diff |
Revision as of 11:27, 15 April 2014 (edit) Eric.jarry (Talk | contribs) (→Reporting errors) Next diff → |
||
Line 116: | Line 116: | ||
Big instances may lead to a large numbers of errors. Using test instances, some log files were several gigabytes in size. | Big instances may lead to a large numbers of errors. Using test instances, some log files were several gigabytes in size. | ||
- | To process large instances leading to, potentially, large number of errots it may be wise to restrict the log file to only data and use a rendering mechanism to present human-friendly error messages. | + | To process large instances leading to, potentially, large number of errots it may be wise to restrict the log file to only data and use a rendering mechanism, like XSLT, to present human-friendly error messages. Table linkbases may be used to present the errors in templates. |
+ | |||
+ | But this practice only limit the size of log files and do not solve the problem. A mechanism stopping the process after a number of errors has been reached may be implemented. | ||
== Rendering the instance == | == Rendering the instance == |
Revision as of 11:27, 15 April 2014
Contents |
Introduction
Several families of taxonomies have led to potentially large instances (e.g. more than a few tens of kilobytes, up to several gigabytes).
The taxonomy currently known as having this characteristic are:
- Taxonomy of Bank of Indonesia, for which an XBRL White paper has been published (http://www.xbrl.org/sites/xbrl.org/files/imce/lrg_instance_proc_indonesia.pdf);
- Solvency II taxonomies defined by EIOPA (European Insurance and Occupational Pensions Authority: https://eiopa.europa.eu);
- Basel III / CRD IV taxonomies, COREP and FINREP, defined by EBA (European Banking Authority: http://www.eba.europa.eu).
Note: the European taxonomies are intended to be used by all countries of the European Union, and more.
The size of these instances are typically due to lists of details for things like loans, financial products or assets.
Some tests have been made and led to difficulties.
The subject is tackled by the XBRL International, in the Standards Board and Best Practices Board and the topic has been discussed in the XBRL International conferences, during the 24th XBRL Conference in Yokohama (December 2012):
- The Challenges of Processing Large Instances by Ashu BHATNAGAR (XBRL International) and Michal PIECHOCKI (BR-AG) - http://archive.xbrl.org/25th/sites/25thconference.xbrl.org/files/TECH2Large%20instances%20session.pdf
- Large Instances Technology by Paul WARREN (CoreFiling) - http://archive.xbrl.org/25th/sites/25thconference.xbrl.org/files/TECH2LargeInstances.pdf
A Working Groupe Note has been published by XBRL International, proposing mainly to adopt a streaming solution and proposing adequate structure of XBRL instance.
This Wiki is a forum where this topic can be freely discussed.
Types of difficulties
Several difficulties may happen at different stages when processing instances, when:
- loading the taxonomy
- generating the instance
- signing the instance
- transmitting the instance
- parsing the instance
- validating the instance
- checking business rules
- reporting errors
- rendering the instance
Loading the taxonomy
In some case big instances correspond to big taxonomies.
When a Data Point Model appears in instances (case of highly dimensional taxonomies), instances are bigger than for moderately dimensional taxonomies, where some dimensional aspects are hidden. This large set of dimensional elements leads to big taxonomy.
Sometimes, it is necessary to chop a taxonomy in several entry points to avoid too big DTS, this is the case of the CORE taxonomy which had to be chopped in four parts.
In the case of multi-lingual taxonomies, like the European ones, existence of labels in several languages also inflate the size of the taxonomy. Care must be taken to include only used labels in a given country (there are 24 languages in the European Union, plus Norwegian and Icelandic).
Generating the instance
The FRIS document put constraints on the ordering of units and contexts that should appear before facts, but this rule must be relaxed because it hinders the streaming of the instances.
This aspect is covered by the Working Group Note.
Signing the instance
Typically, supervisors request the signing of the transmitted instances to fulfil integrity and non-repudiation.
Sometimes, it is also necessary to crypt the instance to fulfil confidentiality.
Security tool may have limitation and adequate tools must be used.
It could also be possible to sign or encrypt a compressed file but this would mean to have a canonical compression algorithm.
Transmitting the instance
Sending a multi-gigabyte document may cause difficulty but should be possible (technolgies exist to exchange video files of several gigabytes).
It is possible to transmit a compressed file that should be much smaller, due to the large compression factor of XML / XBRL files.
Parsing the instance
This aspect is covered by the Working Group Note.
Validating the instance
In this section, validation mean enforcement of the rules defined in XBRL 2.1 and XBRL Dimensions 1.0.
For the memory aspect, such a validation may be done fact by fact, with no need to keep the information in memory.
For dimensional validation, the context (or a representation of it) must be accessed, it is thus necessary to keep contexts-related information available.
Checking business rules
Business checks are typically exercised through assertions (defined by the XBRL Formula specifications).
This is a difficult point for XBRL processors that spend a lot of time for this task.
Software providers may propose optimisations in the expression of formula (for example, suppressing unneeded filters, factorizing filters used several times or putting expressions in variables).
Several optimisation may be considered (to be discussed)
Disposition of facts no longer needed
To process assertions all information of the instance must be accessible, except for facts for which all assertions' evaluations have been fired. For example, a fact being alone to bind to an assertion (e.g.: A > 0) does not need to be accessible for this assertion after it has fired.
If, for each fact considered, a reference count is initiated with the number of possible assertion's evaluation concerning this fact and decremented once such evaluation is fired, it would be possible to free the memory associated with this fact.
However, freeing memory for a single fact may have some disagreement:
- given the memory fragmentation, it may be suboptimal for languages that use a garbage collector like Java or C#;
- computing the possible number of evaluations may be difficult, considering implicit filtering and fall-back values;
- the memory consumption may be lower but the time taken to handle the reference count would increase the processing time.
Slicing the instance into reporting units
Some taxonomies like European banking and insurance supervisory taxonomies defined by EBA, EIOPA and other supervisors in Europe, use the concept of reporting unit. Reporting units allow:
- partial filing, to implement proportionality and materiality principles (small reporters file less than big ones and only significant information is reported); and
- conditional trigerring of business checks (aka XBRL assertions) corresponding to what has been reported, using "filing indicators". For taxonomies defined through templates, there is a correspondance between reporting units and templates.
Given a taxonomy, it is possible to determine which reporting unit(s) a fact belongs to, and slice large instances into smaller chunks easier to process. Each chunk corresponds to a reporting unit or a set of reporting units, if cross-reporting-units assertions are defined in this set.
Reporting errors
Big instances may lead to a large numbers of errors. Using test instances, some log files were several gigabytes in size.
To process large instances leading to, potentially, large number of errots it may be wise to restrict the log file to only data and use a rendering mechanism, like XSLT, to present human-friendly error messages. Table linkbases may be used to present the errors in templates.
But this practice only limit the size of log files and do not solve the problem. A mechanism stopping the process after a number of errors has been reached may be implemented.