Inclusive Language

In this research, I worked on NLP for Inclusive Language. It is part of the E-MIMIC (Empowering Multilingual Inclusive comMunICation) project, a joint effort of the research communities of linguistics and Deep Learning Natural Language Understanding in fighting against non-inclusive and prejudiced language forms. E-MIMIC aims at:

  • Fostering inclusive communications in real-world scenarios
    • Detecting and overcoming language inclusivity issues
    • Grammatical asymmetry (silencing the feminine form)
    • Semantic asymmetry (presence of stereotypes)
  • Exploiting a deep learning pipeline to generalize and automate the process
    • It leverages language models trained on purpose-specific corpora
    • It can detect non-inclusive expressions and suggest inclusive alternatives
  • Currently focusing on Academic and Public Administration Italian documents

Article

Attanasio, G., Greco, S., Apiletti, La Quatra, M., Cagliero, L., Tonti, M., Cerquitelli, T., Raus, R. “E-MIMIC: Empowering Multilingual Inclusive Communication,” 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 2021, pp. 4227-4234, doi: 10.1109/BigData52589.2021.9671868.

Preserving diversity and inclusion is becoming a compelling need in both industry and academia. The ability to use appropriate forms of writing, speaking, and gestures is not widespread even in formal communications such as public calls, public announcements, official reports, and legal documents. The improper use of linguistic expressions can foment unacceptable forms of exclusion, stereotypes as well as forms of verbal violence against minorities, including women. Furthermore, existing machine translation tools are not designed to generate inclusive content.The present paper investigates a joint effort of the research communities of linguistics and Deep Learning Natural Language Understanding in fighting against non-inclusive, prejudiced language forms. It presents a methodology aimed at tackling the improper use of language in formal communication, with a particular attention paid to Romanic languages (Italian, in particular). State-of-the-art Deep Language Modeling architectures are exploited to automatically identify non-inclusive text snippets, suggest alternative forms, and produce inclusive text rephrasing. A preliminary evaluation conducted on a benchmark dataset shows promising results, i.e., 85% accuracy in predicting inclusive/non-inclusive communications.

Article

Article

La Quatra, M., Greco, S., Cagliero, L., Cerquitelli, T., Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023 vol 14175. Springer, Cham. https://doi.org/10.1007/978-3-031-43430-3_31

Inclusive writing is compulsory in formal communications. However, employees in private organizations, universities, and ministries often lack inclusive writing skills. For example, despite Italian grammar having masculine and feminine declensions of words, many official documents have a disrespectful prevalence of the masculine form. To promote inclusive writing practices, we present INCLUSIVELY, a language support tool that leverages natural language processing techniques to automatically identify instances of non-inclusive language and suggest more inclusive alternatives. The tool can be used as a text proofreader and, at the same time, fosters self-learning of inclusive writing forms. The recorded demo of the tool, available at https://youtu.be/3uiW_ti8wmY, shows how end-users can interact with INCLUSIVELY to feed new data, visualize the non-inclusive pieces of text, explore the list of alternative forms, and provide feedback or human annotations for system fine-tuning.

Article GitHub Repo