Contract level data: scope and sources
The Government Transparency Institute has collected administrative data on government contracts and tenders from official public procurement portals and public institutions’ data repositories. This database covers 170 countries, including a third from national public procurement sources and the rest using data on contracts financed by multilateral financial institutions. While some countries report as far back as 1961, reliable electronic public procurement portals became available in the early 2000s. In total, the dataset includes over 45 million contracts, more than 5 million suppliers and 1 million buyers.
The global contract-level dataset – the first of its kind – is the compilation of a range of data collected through source-specific methods, but parsed into a uniform structure and subject to a standard set of data quality checks. The sources in detail are the following:
Table 1. Data sources
All harmonized data are analysed and visualised on the Global Procurement Anticorruption and Transparency Platform developed jointly with the World Bank: www.procurementintegrity.org/.
An Application: Integrity of Procurement Process Index
Integrity indicators are calculated for each contract in the global dataset. These indicators derive from a thorough understanding of corrupt schemes in public procurement and they are validated using other established indicators as well as price information (expecting that corruption risks drive prices up). Individual integrity risk indicators are averaged to produce a composite score with high values indicating high degrees of integrity and low values indicating a low level in the integrity of the procurement process.
For a full review of indicators considered see: Fazekas, Mihály, Luciana Cingolani, & Bence Tóth (2018), Innovations in Objectively Measuring Corruption in Public Procurement. In Helmut K. Anheier, Matthias Haber, and Mark A. Kayser (eds.) Governance Indicators. Approaches, Progress, Promise. Ch. 7. Oxford University Press, Oxford.
For examples of indicator validity tests see: Fazekas, Mihály, and Kocsis, Gábor, (2020), Uncovering High-Level Corruption: Cross-National Corruption Proxies Using Public Procurement Data. British Journal of Political Science.
The list of comparatively applied risk indicators based on which the composite integrity indicator is calculated is the following:
Table 2: Components of the full Integrity index (higher values indicate higher integrity (i.e. lower corruption))
Component name | Category of component | Risk behavior captured by this component |
Use of non-open procedure types | Procurement process risk | Non-open procedures such as direct awards or negotiated procedures without prior publication allow for selecting the connected bidder as the supplier without competition. |
Call for tenders is published | Procurement process risk | A missing call for tenders limits competition by making it harder for bidders to learn about the tendering opportunity. Lack of competition, in turn, allows buyers to award a well-connected company which is informed about the opportunity informally. |
Length of advertisement period (time between tender advertisement and the submission deadline) | Procurement process risk | When the advertisement period is short, non-connected companies are unlikely to bid successfully compared to connected firms, allowing the latter to win more often. |
Single bidder contract | Procurement process risk | Single bidding is the simplest indication of restricted competition which is a fundamental element of corruption in public procurement. |
Length of decision period (time between submission deadline and contract award decision date) | Procurement process risk | The period between submission deadline and the contract award decision can signal corruption risks. Short periods could imply a contrived decision, while longer periods may flag legal difficulties, which could imply a limitation of competition |
Benford’s law | Procurement process risk | Benford’s law is an observation about the leading digits of a naturally occurring collection of numbers. It states that the first digit is likely to be small, for example, in sets that obey the law, the number 1 appears as the leading digit about 30% of the time, while 9 appears as the leading digit less than 5% of the time. If this indicator has high value, it indicates that the price of the contract obeys Benford’s law, thus it’s similar to naturally occurring collection of numbers, and it’s less likely that the price is manipulated. |
Supplier’s share in the buyer’s annual public procurement spending | Supplier risk | A large share of one supplier in the buyer’s annual spending may indicate that a supplier benefits from recurrent favouritism by the buyer. It also signals that competition is limited and non-open which could result in higher pricing, lower quality, and/or lower value for money. |
Supplier is registered in a tax haven | Supplier risk | Suppliers registered in Tax Havens or Opaque Jurisdictions facilitate hiding corrupt profits both from tax authorities and corruption investigators. |
Delivery delay (relative contract length increase) | Procurement process risk | Delayed delivery can influence the contract’s value for money, quality of product or service, or even result in incomplete projects with societal implications. |
Cost overrun | Procurement process risk | Contract value increases after contract award, while in some cases can have valid grounds, can allow the extraction of unjustified profits, hide bribes paid to secure the contract award, or cover expenses if the favored business could only win the contract with the lowest price. |
It is possible to construct a narrower integrity index which corresponds to those elements which are more directly associated with corruption or the absence of it. This narrower integrity index includes only 4 indicators: non-open procedures, single bidding, benford’s law, and tax haven. While the narrower focus makes the concept and its interpretation clearer, it also raises the problem that in the presence of many missing indicator values, the aggregate country scores may become unreliable. For such countries, indicated by extreme scores such as integrity=100, it is advisable to replace the narrow integrity score with the broader integrity score.
Showcasing comparable cross-country integrity scores
Our contract-level database enables cross-country research and analysis. However, as the data are highly diverse in terms of economic sectors covered, the complexity of the contracts, and the presence of contracts not awarded, it has to be narrowed down to a more comparable subset (differences in country or funder procurement regulations cannot be corrected for, by and large). One such cross-country database can be created for high-value construction sector contracts for 170 countries combining contracting data from all available sources, that are tenders published by a) the World Bank (WB), b) Inter-American Development Bank (IADB) and c) national procurement portals.
First, starting with the complete contract-level database, we consider awarded contracts to publicly known suppliers (i.e. contracts where a winner company name is transparently presented). Second, we restrict the contracts dataset to the construction sector using product codes harmonized to match the European Common Procurement Vocabulary classification (2-digit CPV code = 45). To be able to apply such a filter, we harmonized product codes across all countries to match the European CPV classification while we applied a keyword-based product code assignment algorithm where the publication source reports no product classification. Where product codes are largely missing due to source data problems, we resort to using supply type (supply type = WORKS) as well.
Third, only contracts awarded after 2000 are considered because data from earlier years are of lower quality. The time filter is based on a) the contract award date, b) contract signature date , c) the date of the call for tender publication or d) award decision date in the order of their availability. Fourth, only contracts worth 100,000 USD or more are considered to exclude small value contracts which are typically less competitive by nature and often subject to more lenient transparency rules (Erica Bosio, Simeon Djankov, Edward L. Glaeser & Andrei Shleifer (2020) Public Procurement in Law and Practice. NBER Working Paper 27188. DOI 10.3386/w27188.). Contract values are in net terms (i.e. excluding VAT) and are normalized to international US dollars, using PPP conversion rates published by the World Bank. Note that data from all contracts which pass through these four filters – irrespective of their source or any further tender features such as being a framework contract – are included in the aggregation. Moreover, all countries which have at least one observed contract are included, however country scores based on very few contracts, such as 3-5 contracts, are likely to be unreliable (e.g. Japan with only 3 contracts which come from the World Bank dataset).
Using these filters produces a dataset of about 1.2 million construction contracts. The integrity index can be used as a dependent variable in studies that investigate the link between various legal or regulatory reforms and their relationship with contract-level outcomes. If laws and regulations are adequately measured, the integrity index can also be used as a measure of practice to show the gap between what the law says and what happens in practice. While the theoretical integrity index varies between 0 and 100, in practice the highest scoring countries such as Cyprus, Latvia or Finland achieve only 82-92 points. At the other end of the spectrum, the lowest integrity score countries such as El Salvador, Suriname or Iraq achieve 38-43 points.
The contract-level dataset generated can be downloaded here.
Acknowledgments and terms
We are grateful for the generous and wide-ranging funding received from the European Commission’s Horizon 2020 research program, the UK FCDO’s Anticorruption Evidence Program and the Open Society Foundations.
All data and codes as part of this project are licensed under Creative Commons BY-NC-SA 4.0.