
Ritayan Patra
Looking for Roles
Second-guessing hiring me?
Dive into my
GitHub
and see the projects speak louder !!!!
About

Data Engineer / Machine Learning Engineer
I am currently pursuing my Master's in Data Science at Columbia University, New York. I have worked as a Software Engineer (Data Engineer) at NatWest Group, India, where I developed and maintained data pipelines and automated testing processes. I am proficient in Python, Java, SQL, and various data engineering tools.
- Birthday: 19th March, 2000
- Website: rits98.github.io
- Phone: +1 332 250 5172
- Current Address: New York, NY, USA
- Home Address: Kolkata, WB, India
- Age: 25
- Degree: Master of Science in Data Science
- School:Columbia University, New York
- Official Email: rp3247@columbia.edu
- Personal Email: ritayanpatra98@gmail.com
Skills
Languages
- Python
- SQL
- PL/SQL
- Java
Libraries & Frameworks
- Pandas
- NumPy
- Matplotlib
- Seaborn
- PySpark
- Beautiful Soup
- Selenium
- Sklearn
- PyTorch
- Ray
Expertise
- Machine Learning
- Deep Learning
- Natural Language Processing (NLP)
- Large Language Models (LLM)
- Retrieval Augmented Generation (RAG)
- Data Engineering
- Data Modeling
- Data Warehousing
- Extract Transform Load (ETL)
- Probability & Statistics
Tools & Platforms
Certifications
Oracle Cloud Infrastructure 2023 AI Certified Foundations Associate
Oracle
Issued February 2024
Oracle Cloud Infrastructure 2023 Certified Application Integration Professional
Oracle
Issued October 2023
Oracle Cloud Infrastructure 2023 Certified Foundations Associate
Oracle
Issued June 2023
Resume
Education
Master of Science in Data Science
September, 2024 - December, 2025
Columbia University, New York, USA
GPA: 3.6/4.0
Relevant Subjects:
- Spring 2025:
- Machine Learning for Data Science (COMS4721W001)
- Computer Systems for Data Science (CSEE4121W002)
- Forecasting: A Real-World Application (IEOR4578E001)
- Statistical Inference and Modeling (STAT5703GR001)
- Fall 2024:
- Applied Machine Learning (COMS4995W032)
- Applied Deep Learning (COMS4995W031)
- Exploratory Data Analysis and Visualization (STATGR5702)
- Probability and Statistics (STATGR5701)
- Civil Engineering Research (CIEN9101E)
Bachelor of Technology in Electronics and Communication Engineering
July, 2018 - June, 2022
Vellore Institute of Technology, TN, India
GPA: 3.9/4.0
Relevant Subjects:
- Data Structures and Algorithms
- Database Management Systems
- Operating Systems
- Computer Networks
- Probability and Statistics
- Calculus
- Linear Algebra
Professional Experience
Research Assistant
May, 2025 - August, 2025
Columbia Engineering, New York, USA
- Developed and deployed an NLP pipeline on Columbia’s high-performance computing cluster to analyze the impact of 10,000 medical trials on surgical research
- Preprocessed and filtered 38M+ PubMed articles using BM25 to identify semantically related biomedical literature.
- Generated dense embeddings with PubMedBERT and indexed them using FAISS for scalable and efficient similarity-based retrieval
- Built a lightweight Python interface for searching and retrieving related publications based on text similarity
- Provided weekly updates to faculty, ensuring progress aligned with research objectives and code was reproducible
Software Engineer
July, 2022 - July, 2024
NatWest Group, Gurugram, India
- Orchestrated and maintained data pipelines on Oracle Integration Cloud to streamline transaction and reference data processing of the bank enhancing operational efficiency for downstream teams
- Developed and deployed new ETL data pipelines to migrate legacy Oracle Hyperion Server to Oracle Integration Cloud, saving nearly £1 million annually on maintenance expenses of legacy system
- Automated data staging and transformation processes using SQL and Python, enabling seamless Oracle Database and Oracle ERP integration
- Designed and implemented a test automation suite using Java and Selenium for Oracle ERP systems, slashing manual effort by 80% and reducing testing time from two weeks to three days
- Implemented GitLab CI/CD pipelines with senior software engineers for smooth deployment of code and artifacts to SIT, UAT, and Production environments
Data Science Intern
September, 2020 - November, 2020
Nuclei Technologies, Navi Mumbai, India
- Acquired proficiency in R, Python, and data analysis libraries like Pandas, NumPy, Matplotlib, and Seaborn
- Analyzed employee attrition and hypermarket data using Pandas, NumPy, and Matplotlib to uncover key insights
Resume Preview
Featured Projects
Explore my data engineering, machine learning, and AI projects showcasing end-to-end solutions
- All Projects
- AI & GenAI
- Data Engineering
- Machine Learning
- Data Analytics
Testimonials
Contact
Address
Columbia University, Manhattan, New York, NY 10027
Call Me
+1 332 250 5172
Email Me
rp3247@columbia.edu
ritayanpatra98@gmail.com