Aman Chadha:Pioneering Indic LLMs in Collaboration with IITs

In a groundbreaking collaboration with Indian Institutes of Technology (IITs), Aman Chadha, a distinguished Stanford University alumnus and leader of the generative AI research team at AWS, is spearheading the creation of Indic Large Language Models (LLMs). This innovative endeavor represents a significant stride in AI technology tailored specifically to Indian languages and contexts.

Chadha’s primary focus lies in the development of India’s inaugural medical LLM, catering to Hindi and various other Indic languages. In partnership with IIT Patna, this ambitious project aims to fill the void in medical AI models dedicated to Indic languages, an arena currently dominated by English-centric models like Google’s MedPaLM. The approach involves leveraging Open Hathi as the foundational LLM, subsequently fine-tuned on medical data in Indic languages—a process made challenging by the intricate nature of medical jargon. Prioritizing patient privacy, the project relies on anonymized data, acknowledging the importance of ethical considerations in AI development.

Despite facing challenges, including the scarcity of computational resources in India crucial for training advanced models, Chadha maintains optimism, viewing constraints as catalysts for innovative solutions. Beyond the medical realm, the project underscores a broader movement toward culturally and linguistically inclusive AI models. Such inclusivity is pivotal for preserving diverse languages and enhancing accessibility to digital services in local languages.

The significance of this initiative extends beyond individual efforts, aligning with larger initiatives such as the Bhasini project initiated by the Indian government. This national project focuses on developing technologies for translating content across Indian languages and crowd-sourcing voice datasets to enhance accessibility to digital services. Educational institutions like IISc and IIT Madras, along with companies like Microsoft, actively contribute to building datasets for Indic languages. However, challenges persist, particularly concerning data scarcity and fragmentation, especially for languages other than Hindi.

Tech Mahindra is actively addressing these challenges through Project Indus, aggregating information from various platforms to construct datasets for diverse languages. Moreover, they are tackling potential biases in AI models by employing a combination of human annotation and automatic techniques.

The development of Indic LLMs is a significant stride towards democratizing AI technology in India, rendering it more relevant and accessible to a broader section of the population. This effort underscores the need for increased government funding and support to foster AI ecosystems that cater to diverse linguistic and cultural backgrounds, ultimately contributing to a more inclusive and technologically empowered society.

Read More On: Thestartupscoup.Com