LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

More recent
Us-vs-Them bias in Large Language Models

More or equal citational love
Story and essential meaning dynamics in Bangladesh's July 2024 Student-People's Uprising

Less recent
Computational Story Lab at BLP-2025 Task 1: HateSense: A multi-task learning framework for comprehensive hate speech identification using LLMs

Less or equal citational love
Park visitation and walkshed demographics in the United States

LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

Tabia Tanzin Prama

Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), 2025

journal version | journal page

Times cited: 2

Abstract:

Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource contexts remains underexplored. This study presents the first systematic investigation of LLM-based Machine Translation (MT) for Sylheti, a dialect of Bangla that is itself low-resource. We evaluate five advanced LLMs (GPT-4.1, GPT-4.1-mini, LLaMA 4, Grok 3, and Deepseek V3. 2) across both translation directions (Bangla↔ Sylheti), and find that these models struggle with dialect-specific vocabulary. To address this, we introduce Sylheti-CAP (Context-Aware Prompting), a three-step framework that embeds a linguistic rulebook, dictionary (core vocabulary and idioms), and authenticity check directly into prompts. Extensive experiments show that Sylheti-CAP consistently improves translation quality across models and prompting strategies. Both automatic metrics and human evaluations confirm its effectiveness, while qualitative analysis reveals notable reductions in hallucinations, ambiguities, and awkward phrasing—establishing Sylheti-CAP as a scalable solution for dialectal and low-resource MT.

This is the default HTML.
You can replace it with your own.
Include your own code without the HTML, Head, or Body tags.

BibTeX:

@inproceedings{prama2025c,
  author =	 {Prama, Tabia Tanzin},
  title =	 {{LLM}s for low-resource dialect translation using
                  context-aware prompting: {A} case study on {S}ylheti},
  booktitle =	 {Proceedings of the Second Workshop on Bangla
                  Language Processing (BLP-2025)},
  year =	 {2025},
  key =		 {},
  pages =	 {292–308},
  url =		 {https://aclanthology.org/2025.banglalp-1.24/},
}

Less recent
Computational Story Lab at BLP-2025 Task 1: HateSense: A multi-task learning framework for comprehensive hate speech identification using LLMs

Less or equal citational love
Park visitation and walkshed demographics in the United States

LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

Tabia Tanzin Prama

Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), 2025

journal version | journal page

Times cited: 2

Abstract:

BibTeX:

Share this page:

Some of our Panometer’s online instruments:

Storywrangler: Track and compare Twitter n-grams from 2008 on in 100+ languages.

The Lexicocalorimeter: Measuring calories in and calories out with tweets.

The POTUSometer: Computational history, narrative control, ratios, and chronopathy—measuring how time flies and crawls.

Explore the Teletherm: the on-average coldest and warmest days of the year.

The Hedonometer: Measuring the happiness (and sadness) of all kinds of texts.

© Peter Sheridan Dodds, 7+13+5, 1995–