Hidden barriers to open competition: Using text mining to uncover corrupt restrictions to competition in public procurement

Eszter Katona & Mihály Fazekas (2024). Hidden barriers to open competition: Using text mining to uncover corrupt restrictions to competition in public procurement. GTI-WP/2024:01, Budapest: Government Transparency Institute

Public procurement accounts for one third of government spending across the world, while it is also particularly vulnerable to corruption. Large amounts of open administrative data enabled a rich literature on measuring corruption. However, scholarship largely focuses on structured information on government tenders, neglecting text fields which are particularly suitable for hiding wrongdoing. To address this gap, this article identifies strategies for limiting competition by tailoring tendering terms to a favoured bidder. We argue that subtle, text-based strategies are employed by corrupt actors when more visible strategies for favouritism, such as non-competitive tendering procedures, are undesirable or impractical. Using data on all published government tenders in Hungary between 2011-2020 of 119,000 contracts, we deploy a host of traditional regression and advanced machine learning models such as Random Forests. We find that specific phrases in bidding conditions, product descriptions and assessment criteria lead to single bidding in otherwise competitive markets. Including texts improves model accuracy from 77% (structured variables only) to 82% (structured and all text data together). We unpack our complex machine learning models by pinpointing terms conducive to deliberate market access restrictions such as overly specific bidding eligibility criteria. We demonstrate that text mining has the capacity to advance our understanding of corrupt behaviours and to better target anti-corruption policies.

Read our full working paper HERE.