Ideas for ML computing


June 1, 2020

Disclaimer: ML stands not for machine learning, but for Malayalam

Swathanthra Malayalam in association with Tinkerhub conducted a special program called People behind SMC. SMC is preimier organisation working in the area of Malayalam computing. It was interesting to listen to experiences of various folks like Anivar, Santhosh thottingal sir, Balasankar C, Jishnu Mohan, Kavya Manohar etc. The sessions was hosted by Hrishi Chettai in Tinkerhubs instagram pages. The recorded sessions can be found here.

After listening to these talks, I want to share some ideas which came in my mind:

1. Teaching Language computing with Python

Santhosh thottingal sir usually starts most of his beginner talks, by asking a cliche example like has anyone tried to code in Malayalam? He usually shows a C sample code for Hello world, and first time I saw this I was really fascinated which lead to [artcle link]. Seeing this Adithya and Subin even coded with old style malayalam letters, and this was the first time I was seeing writings in Malayalam digits.

There is a lot of difference, when you start thinking of programming with Malayalam instead of English. There will 4 bytes for Malayalm instead of 1 bytes. It may be good to create a progrmming guide on how to program in Python for Language computing for various applications.

I personally feel this can be a good topic to talk in any Pycon conference, as this area is relatively being under-utilised.

2. Covid-19 dataset

Kerala has put up a commendable performance so far in facing Covid-19 panademic. KHA has been writing daily health bulletins written using Manjeri font. One of common criticism Kerala model has been seen is the lack of scientific papers being published. Another idea in my mind is to create a dataset of KHA bulletins so people can study these Kerala model and dataset can be placed in SMC project’s text corpus. The initial data has been collected and uploaded . Yet one of the challenges is how to convert these PDFs to a useful format for data analysis. In Kaggle they converted the CORD dataset into a no of JSon files.