JAGATH KATAMA
Github | LinkedIn | Portfolio | Gmail | Resume
EDUCATION
- Indiana University, Luddy School of Informatics, Computing, and Engineering Indianapolis, IN
- Master of Science in Applied Data Science
- Expected Graduation: Dec 2025
- National Institute of Technology, Rourkela, Odisha, India
- Bachelor of Technology in Electronics and Communication Engineering
- Graduation: May 2023
PROFESSIONAL EXPERIENCE
Research Assistant - Gen AI and NLP Research | Health - Technology Lab | May 2025 - Present
- Currently working on confidential medical project under the guidance of Prof. Hee - Tae Jung in the Health Laboratory Lab involving advanced NLP and Generative AI techniques, with responsibilities spanning from model development to production deployment.
- Provided feedback to Applied Data Science Master’s students on statistical analysis and R programming assignments, enhancing their conceptual and practical skills.
- Worked as a TA for INFO-I 308: Information Representation (Undergraduate Course). where I instructed undergraduate students in database design principles and SQL using MySQL and phpMyAdmin. Evaluated student assignments and provided technical support.
DATA SCIENCE PROJECTS EXPERIENCE
- Developed Python code generation models using lightweight large language models (LLMs), including GPT-2, LLaMA 3.2, Gemma 3, and Qwen 2.5. Fine-tuned over 200K+ Python code samples using Parameter-Efficient Fine-Tuning (PEFT) with LoRA and quantization techniques on multi-GPU setups (NVIDIA A16) using the Accelerate library.
- Designed and executed experiments to benchmark model architectures (124M–3B params), optimizing training via gradient checkpointing and evaluating outputs with cross-entropy loss and code-specific tasks.
- Identified performance tradeoffs across model families and exposed limitations in metrics like perplexity and CodeBLEU, contributing to research on efficient, local LLM-based code generation.
Transportation Company Database Design - MySQL database design | [Github] | Fall 2024
- Designed and developed a Transportation Management Database for managing bookings, vehicle and driver assignments, invoicing, and SaaS subscriptions for a road-based transport company.
- Implemented role-based access controls and business rules to ensure data security and system integrity, including constraints for vehicle and customer assignments.
- Optimized database structure through normalization (1NF to 3NF) to reduce redundancy and enhance data integrity.
- Created detailed documentation outlining database schema, relationships, business rules, and SQL setup instructions for seamless integration and use.
Apartment Recommender – Content-Based Filtering System | [Github] | Fall 2024
- Designed a content-based filtering model to analyze user preferences, generating personalized apartment recommendations (from nearly 10k apartments).
- Applied Principal Component Analysis (PCA) and pairwise similarity metrics, improving the efficiency of recommendation generation by 20% while reducing computational overhead.
- Implemented a count-vectorizer to improve search capabilities, reducing search query processing time to 0.5 seconds,10% compared to initial baseline models.
- Deployed the model as an interactive web application using Streamlit, with visualizations to enhance user engagement.
Obesity Estimation Based on Lifestyle Variables in R - Statistical Analysis in R | [Rpubs] | Fall 2024
- Formulated testable hypotheses (ANOVA, chi-square tests, t-tests) to analyze the relationship between BMI and lifestyle factors (more than 12 factors).
- Conducted regression analyses on multiple response variable types (continuous, count, proportion/rate) to evaluate the impact of lifestyle variables on obesity risk.
- Interpreted statistical findings for reliability and validity to derive actionable insights.
CERTIFICATIONS
-
| DeepLearning.AI Tensorflow Developer Specialization |
Coursera |
May 2024 |
pdf |
TECHNICAL SKILLS
- Programming languages: Python, R, SQL (Advanced), C/C++, Bash Scripting
- Developer Tools: Tensorflow, VSCode, Pycharm, Pyspark, Tableao, Excel, Docker, Rstudio
- Cloud Platform: AWS
- Big Data Engineering: Apache Spark, Pyspark, Apache Kafka
- ML/AI Framework: Tensorflow, Pytorch, HuggingFace, Transformers
- Database: MySQL, SQLite, MongoDB, NoSQL, VectorDB
- Libraries: Pandas, Numpy, PyTorch, keras, tensorflow, Scikit-learn,Asyncio, Hugging Face, Seaborn, requests, BeautifulSoup, Selenium, transformers, Datasets, NLTK, SpaCy, Accelerator, Langchain