IndicMedDialog brings multi-turn medical chat to nine Indic languages
Researchers release IndicMedDialog, a parallel medical dialogue dataset covering English and nine Indic languages, alongside IndicMedLM, a fine-tuned model for symptom elicitation across ten languages.

IndicMedDialog is a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. Researchers Shubham Kumar Nigam, Suparnojit Sarkar, and Piyush Patel built the dataset by extending MDDial with LLM-generated synthetic consultations, translated using TranslateGemma, verified by native speakers, and refined through a script-aware post-processing pipeline to correct phonetic, lexical, and character-spacing errors. Most existing medical dialogue systems operate in single-turn question-answering mode or rely on template-based datasets that lack conversational realism and multilingual reach.
The team fine-tuned IndicMedLM via parameter-efficient adaptation of a quantized small language model, incorporating optional patient pre-context to personalize multi-turn symptom elicitation. Evaluation included zero-shot multilingual baselines, systematic error analysis across all ten languages, and clinical plausibility validation through medical expert review. The preprint dropped May 14, 2026.