Research Topics

Check our commitment to advancing multilingual corpus modelling

Check our commitment to advancing multilingual corpus modelling is reflected through our contributions to various research initiatives:

Thank you for reading this post, don't forget to subscribe!
  • Extractor: A Corpus extractor and annotator
    We developed Extractor, an online tool that help extracting insight fron content.

  • Darija Corpus: Language model training in Moroccan Darija for a LLM
    Working on a language model for Moroccan Darija.

  • Visual graphics with Multilingual with audio and text representations
    Improving the Ai model with visual data recognition.

Our research program spans a diverse range of advanced methodologies in data processing and linguistic analysis. We explore data annotation strategies, emphasizing both the accuracy of manual annotation across various modalities—such as images, text, and sound—and the efficiency gains achievable through automated annotation techniques, particularly with Python. This dual approach allows for a comprehensive understanding of how data can be effectively tagged and categorized, whether through human expertise or machine learning models. Complementing this, we apply sophisticated text analytics to dissect the nuanced ways in which language functions within specific contexts.

Building on these insights, our work extends to the field of content translation and enrichment, where we seek to enhance the precision, cultural appropriateness, and contextual relevance of translated materials. This includes addressing challenges in adapting content to varying cultural contexts without losing meaning or nuance. In addition, we tackle the complexities of software localisation. Our research also delves into the intricate relationship between language and culture, exploring how linguistic choices reflect and influence cultural norms and values. We underscore the importance of terminology standardization as a tool for achieving clarity, consistency, and effective communication across sectors and languages.

Finally, we engage in the development of cutting-edge knowledge representation techniques, such as linked data, taxonomies, and knowledge graphs. These frameworks enable us to structure, interlink, and enrich information, facilitating more efficient retrieval and deeper, context-aware understanding across a range of domains.