Ritayan Patra

Looking for Roles


Second-guessing hiring me?
Dive into my GitHub   and see the projects speak louder !!!!

About

Data Engineer / Machine Learning Engineer

I am currently pursuing my Master's in Data Science at Columbia University, New York. I have worked as a Software Engineer (Data Engineer) at NatWest Group, India, where I developed and maintained data pipelines and automated testing processes. I am proficient in Python, Java, SQL, and various data engineering tools.

  • Birthday: 19th March, 2000
  • Website: rits98.github.io
  • Phone: +1 332 250 5172
  • Current Address: New York, NY, USA
  • Home Address: Kolkata, WB, India
  • Age: 25
  • Degree: Master of Science in Data Science
  • School:Columbia University, New York
  • Official Email: rp3247@columbia.edu
  • Personal Email: ritayanpatra98@gmail.com

Skills

Languages

  • Python
  • SQL
  • PL/SQL
  • Java

Libraries & Frameworks

  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • PySpark
  • Beautiful Soup
  • Selenium
  • Sklearn
  • PyTorch
  • Ray

Expertise

  • Machine Learning
  • Deep Learning
  • Natural Language Processing (NLP)
  • Large Language Models (LLM)
  • Retrieval Augmented Generation (RAG)
  • Data Engineering
  • Data Modeling
  • Data Warehousing
  • Extract Transform Load (ETL)
  • Probability & Statistics

Tools & Platforms

Git GitLab CI/CD Oracle Integration Cloud AWS Apache Spark Apache Hadoop Apache Hive Apache Kafka Apache Iceberg PostgreSQL Oracle Database Apache Airflow MLflow Docker

Certifications

Oracle Cloud Infrastructure 2023 AI Certified Foundations Associate

Oracle

Issued February 2024

Certificate Link

Oracle Cloud Infrastructure 2023 Certified Application Integration Professional

Oracle

Issued October 2023

Certificate Link

Oracle Cloud Infrastructure 2023 Certified Foundations Associate

Oracle

Issued June 2023

Certificate Link

NatWest: Data Engineering Foundation Learning Track

DataCamp

Issued January 2023

Certificate Link

Education

Master of Science in Data Science

September, 2024 - December, 2025

Columbia University, New York, USA

GPA: 3.6/4.0

Relevant Subjects:

  • Spring 2025:
    • Machine Learning for Data Science (COMS4721W001)
    • Computer Systems for Data Science (CSEE4121W002)
    • Forecasting: A Real-World Application (IEOR4578E001)
    • Statistical Inference and Modeling (STAT5703GR001)
  • Fall 2024:
    • Applied Machine Learning (COMS4995W032)
    • Applied Deep Learning (COMS4995W031)
    • Exploratory Data Analysis and Visualization (STATGR5702)
    • Probability and Statistics (STATGR5701)
    • Civil Engineering Research (CIEN9101E)

Bachelor of Technology in Electronics and Communication Engineering

July, 2018 - June, 2022

Vellore Institute of Technology, TN, India

GPA: 3.9/4.0

Relevant Subjects:

  • Data Structures and Algorithms
  • Database Management Systems
  • Operating Systems
  • Computer Networks
  • Probability and Statistics
  • Calculus
  • Linear Algebra

Professional Experience

Research Assistant

May, 2025 - August, 2025

Columbia Engineering, New York, USA

  • Developed and deployed an NLP pipeline on Columbia’s high-performance computing cluster to analyze the impact of 10,000 medical trials on surgical research
  • Preprocessed and filtered 38M+ PubMed articles using BM25 to identify semantically related biomedical literature.
  • Generated dense embeddings with PubMedBERT and indexed them using FAISS for scalable and efficient similarity-based retrieval
  • Built a lightweight Python interface for searching and retrieving related publications based on text similarity
  • Provided weekly updates to faculty, ensuring progress aligned with research objectives and code was reproducible

Software Engineer

July, 2022 - July, 2024

NatWest Group, Gurugram, India

  • Orchestrated and maintained data pipelines on Oracle Integration Cloud to streamline transaction and reference data processing of the bank enhancing operational efficiency for downstream teams
  • Developed and deployed new ETL data pipelines to migrate legacy Oracle Hyperion Server to Oracle Integration Cloud, saving nearly £1 million annually on maintenance expenses of legacy system
  • Automated data staging and transformation processes using SQL and Python, enabling seamless Oracle Database and Oracle ERP integration
  • Designed and implemented a test automation suite using Java and Selenium for Oracle ERP systems, slashing manual effort by 80% and reducing testing time from two weeks to three days
  • Implemented GitLab CI/CD pipelines with senior software engineers for smooth deployment of code and artifacts to SIT, UAT, and Production environments

Data Science Intern

September, 2020 - November, 2020

Nuclei Technologies, Navi Mumbai, India

  • Acquired proficiency in R, Python, and data analysis libraries like Pandas, NumPy, Matplotlib, and Seaborn
  • Analyzed employee attrition and hypermarket data using Pandas, NumPy, and Matplotlib to uncover key insights

Resume Preview

Having trouble viewing? Try opening in a new tab or downloading the PDF directly.

Featured Projects

Explore my data engineering, machine learning, and AI projects showcasing end-to-end solutions

  • All Projects
  • AI & GenAI
  • Data Engineering
  • Machine Learning
  • Data Analytics
Apple AppStore Analytics with Simulated A/B Testing

Apple AppStore Analytics with Simulated A/B Testing

Tech Stack: Python, Docker, Apache Superset, PostgreSQL

Key Features: Advanced Data Analysis, A/B Testing, Business Intelligence

Conversational RAG over PDFs

Conversational RAG over PDFs

Tech Stack: Python, Streamlit, LangChain, LangSmith, Chroma Vector DB, Ollama, Groq API

Key Features: Multi-document chat, Vector embeddings, Real-time conversation history

URL Summarizer Using LangChain + GROQ

URL Summarizer Using LangChain + GROQ

Tech Stack: Python, Streamlit, LangChain, LangSmith, Ollama, Groq API

Key Features: Web scraping, Content extraction, AI-powered summarization

Near Realtime Stock Forecasting

Near Real-time Stock Forecasting Platform

Tech Stack: Docker, Airflow, PostgreSQL, AWS (S3, Glue, Athena, DynamoDB, Kinesis), MLflow, Ray, PyTorch, Streamlit

Key Features: Real-time streaming, Deep learning models (CNN, LSTM), MLOps pipeline

Reddit API Data Pipeline

Reddit API Data Pipeline

Tech Stack: Docker, Python, Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena

Key Features: Automated ETL, REST API integration, Cloud data warehousing

IoT Vehicle Data Engineering

IoT Vehicle Data Engineering Pipeline

Tech Stack: Docker, Python, Apache Kafka, Apache Spark, AWS Glue, Amazon Redshift

Key Features: Stream processing, Real-time analytics, Scalable architecture

Hadoop Dockerized Setup

Hadoop + Hive + PostgreSQL Dockerized Setup

Tech Stack: Docker, Apache Hadoop, Apache Hive, PostgreSQL, Hue

Key Features: Multi-container orchestration, Big data processing, Web-based query interface

Airbnb Stream Data Ingestion

Airbnb Stream Data Ingestion

Tech Stack: Python, AWS S3, AWS Lambda, EventBridge, SQS

Key Features: Event-driven architecture, Serverless processing, Real-time ingestion

IMDB

IMDB Movie Ingest and Process

Tech Stack: Python, AWS S3, AWS Glue, AWS Glue Catelog

Key Features: Data Engineering, Cloud, ETL

Airbnb Stream Data Ingestion

Sales Data Engineering and Analysis

Tech Stack: Python, AWS S3, AWS Lambda, Dynamo DB, Kinesis, Athena

Key Features: Change Data Capture, Data Engineering, Event-driven

Airline Passenger Referral Prediction

Airline Passenger Referral Prediction

Tech Stack: Python, AWS S3, AWS Glue, Step Functions, Amazon Redshift

Key Features: Multiple ML algorithms, Hyperparameter tuning, Cloud-based ML pipeline

Customer Churn Prediction

Customer Churn Prediction with Deep Learning

Tech Stack: Python, PyTorch, PyTorch Lightning, Pandas, Scikit-learn, ONNX, Optuna, Streamlit

Key Features: Neural networks, Automated hyperparameter tuning, Interactive web app

Sarcasm Detection NLP

Sarcasm Detection with NLP

Tech Stack: Python, Pandas, Keras, Scikit-learn, Hugging Face Transformers

Key Features: BERT embeddings, LSTM models, Text classification, OpenAI embeddings

Food Delivery Time Prediction

Food Delivery Time Prediction

Tech Stack: Python, Pandas, Scikit-learn, Matplotlib

Key Features: Data Analysis, Machine Learning

Weather Data Pipeline

Automated Weather Data Pipeline

Tech Stack: Docker, Apache Airflow, dbt, PostgreSQL, Apache Superset

Key Features: End-to-end automation, Data modeling, Interactive dashboards

Music Store Data Analysis

Music Store Data Analysis with SQL

Tech Stack: Python, PostgreSQL, SQL

Key Features: Data modeling (Star Schema), Business intelligence, Complex queries

Energy Consumption Analysis

Energy Consumption Analysis & Visualization

Tech Stack: R Programming, D3.js, Statistical Analysis

Key Features: Time series analysis, Interactive visualizations, Statistical modeling

Testimonials

It's rare to come across someone as motivated and enthusiastic as Ritayan. He consistently went above and beyond to ensure that our team met its targets. His positive attitude and strong work ethic were contagious, and he always brought fresh ideas to the table.

Ritayan is an exceptional developer who possesses all the skills one would want in an excellent software developer. From Python to Java and Integration to automation, he masters the top programming languages and technologies. He has been a great resource to my company. He did an incredible job on all projects, making timely deliveries and not hesitate to learn new things when it demands. His work is always top-notch, and he is always welcoming to feedback and making improvements. Plus, Ritayan is self-motivated and a great team player.

Prasath C.

Vice President, NatWest Group

Contact

Address

Columbia University, Manhattan, New York, NY 10027

Call Me

+1 332 250 5172

Email Me

rp3247@columbia.edu

ritayanpatra98@gmail.com