CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking.

Journal: Scientific data
PMID:

Abstract

Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.

Authors

  • Hansi Hettiarachchi
    Faculty of Science and Technology, Lancaster University, Lancaster, LA1 4WA, UK. h.hettiarachchi@lancaster.ac.uk.
  • Amna Dridi
    Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, B4 7XG, UK.
  • Mohamed Medhat Gaber
    Robert Gordon University, Garthdee House, Garthdee Road, Aberdeen AB10 7QB, UK.
  • Pouyan Parsafard
    Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, B4 7XG, UK.
  • Nicoleta Bocaneala
    Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, B4 7XG, UK.
  • Katja Breitenfelder
    Fraunhofer Institute for Building Physics IBP, Department Indoor Climate and Climatic Impacts, Fraunhofer Str. 10, 83626, Valley, Germany.
  • Gonçal Costa
    Human Environment Research (HER), La Salle, Ramon Llull University, Barcelona, Catalonia, Spain.
  • Maria Hedblom
    Department of Computing, School of Engineering, Jönköping University, Box 1026, 551 11, Jönköping, Sweden.
  • Mihaela Juganaru-Mathieu
    Mines Saint-Etienne, Institut Henri Fayol, Département ISI, F - 42023, Saint-Etienne, France.
  • Thamer Mecharnia
    Université de Lorraine, CNRS, LORIA, 54506, Vandœuvre-lès-Nancy, France.
  • Sumee Park
    Fraunhofer Institute for Building Physics IBP, Department Indoor Climate and Climatic Impacts, Fraunhofer Str. 10, 83626, Valley, Germany.
  • He Tan
    Department of Computing, School of Engineering, Jönköping University, Box 1026, 551 11, Jönköping, Sweden.
  • Abdel-Rahman H Tawil
    Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, B4 7XG, UK.
  • Edlira Vakaj
    Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, B4 7XG, UK.