JuriBERT: A Masked-Language Model Adaptation for French Legal Text
9 October 2023
Given that some specific tasks do not benefit from generic language models pre-trained on large amounts of data, this research project had sought to investigate the adaptation of domain-specific BERT models in the French language to the legal domain, with the ultimate goal of helping law professionals. The project further explored the use of smaller architectures in domain-specific sub-languages.
The resulting set of BERT models, called JuriBERT, proved that domain-specific pre-trained models can perform better than their equivalent generalised ones in the legal domain.
In particular, the team applied JuriBERT to help speed up case assignment between the Cour’s distinct formations, a task that until then was done manually and slowed down the cassation proceedings substantially. The model was able to accurately predict the most relevant formation for judgment based on the text of the appeal brief. The research further included preliminary results as to the ways to compute the complexity of a given case, again based on the text of the appeal brief.
Partners
Cour de Cassation, Ordre des avocats au conseil d’état et à la cour de cassation, HEC Paris, Polytechnique Paris, Hi!Paris