Hi, I’m

Aayon

Welcome To The World Of Data

Who Am I

I’m a passionate Data Scientist with a focus on leveraging advanced data science techniques such as machine learning, deep learning, and predictive analytics to solve complex problems across various domains.

Throughout my career, I have built and deployed scalable machine learning models, optimized data pipelines, and contributed to cloud-based architecture to ensure efficient data-driven solutions.

With expertise in Python, R, SQL, TensorFlow, Docker, and cloud platforms like AWS and Azure, I specialize in creating solutions that drive business decisions and innovation.

I am committed to transforming raw data into meaningful insights and helping businesses make data-driven decisions with clarity and precision.

Work Experience

Data Scientist | Keen & Able

(Feb 2022 - Aug 2023) Noida, India
  • Led a predictive analytics project that developed machine learning models using Python and Scikit-learn to forecast customer churn, reducing churn rates by 18% and improving retention strategies across multiple departments.
  • Collaborated with Data Engineers to build and optimize a scalable ETL pipeline using SQL, Airflow, and AWS, reducing processing times by 35% and enhancing data accessibility for advanced analytics and business intelligence
  • Implemented A/B testing frameworks to analyze marketing campaigns, using statistical models to provide actionable insights, leading to a 15% increase in customer engagement and 8% rise in revenue from targeted campaigns.
  • Developed an interactive dashboard in Tableau that visualized real-time customer behavior data, empowering the sales and marketing teams to make data-driven decisions, ultimately boosting cross-selling opportunities by 12%.

Data Analyst Intern | Acxiom Consulting

(Sep 2021 - Jan 2022) Gurugram, India
  • Conducted exploratory data analysis (EDA) using pandas to identify customer usage patterns and churn predictors in telecom data, contributing to a 28% improvement in customer retention strategies.
  • Built and visualized sentiment analysis models using pandas and NLTK to process customer feedback from telecom service surveys, achieving a 94% accuracy rate and enabling targeted service enhancements.
  • Designed real-time data validation pipelines with MySQL and pandas, reducing data errors by 56%.

Projects

Project Image

Chicken Disease Classification

Deep Learning-Based Chicken Disease Classification: The project uses deep learning techniques to preprocess image data and train a model to classify chicken diseases. It incorporates a pipeline for data preprocessing and model training with version control managed via DVC.

CI/CD Integration and Containerization: GitHub Actions automates continuous integration and deployment, using Docker for containerization. The deployment workflows ensure seamless building, testing, and running of the application in cloud environments.

Cloud Deployment on AWS and Azure: The project supports deployment on AWS EC2 and Azure Web App. AWS deployment involves utilizing ECR for Docker image storage and EC2 for container execution, while Azure deployment leverages the Azure Container Registry and Web App Server for hosting the application.

 

Project Image

Student Score Prediction

Exploratory Data Analysis (EDA): The project analyzes math score distributions across demographics, identifies correlations with socio-economic factors, and detects patterns and outliers to inform feature selection.

Data Transformation Pipeline: A preprocessing pipeline handles missing values, scales numerical data, and encodes categorical features using ColumnTrans former, ensuring high-quality and consistent input for machine learning.

Model Training and Deployment: Models are trained and optimized via Grid Search, evaluated with MAE and RMSE, and prepared for redeployment on AWS Elastic Beanstalk

Project Image

Advanced RAG Agent

Advanced PDF Information Retrieval with Semantic Understanding: Implemented a robust pipeline for extracting and preprocessing content from PDFs, utilizing Optical Character Recognition (OCR) for scanned documents and embedding-based methods to semantically structure data. This ensures accurate alignment with large language models (LLMs) for advanced query answering.

Contextual Compression for Optimal Data Ingestion: Designed and integrated contextual summarization techniques, leveraging transformer-based architectures to reduce irrelevant data and noise while retaining critical semantic context. This preprocessing step enhances the efficiency and relevance of inputs fed into LLMs.

Enhanced Query Response via Adaptive Data Preprocessing: Combined hierarchical embeddings with fine-tuned LLMs to dynamically extract and rank document segments based on user query intent, enabling precise, context-aware answers directly from large PDF datasets.

Project Image

House Price Predictor - Advanced

In-Depth Exploratory Data Analysis (EDA): Conducted thorough EDA to uncover data patterns and relationships, forming the foundation for effective feature engineering and model development.

Advanced Feature Engineering and Model Validation: Developed meaningful features and implemented a robust model with rigorous testing and validation, ensuring high predictive performance.

MLOps Integration with ZenML and MLflow: Utilized ZenML and MLflow for seamless experiment tracking and deployment, exemplifying proficiency in modern MLOps practices.

Project Image

Protein Structure Prediction with Custom AlphaFold

Custom Implementation of AlphaFold Architecture: Recreated the AlphaFold model using Python and PyTorch, focusing on the intricate neural network layers and operations essential for accurate protein structure prediction.

Integration of Multiple Sequence Alignments and Structural Templates: Utilized multiple sequence alignments and structural templates to enhance the accuracy of protein folding predictions, effectively capturing evolutionary relationships and structural constraints.

Deployment of End-to-End Inference Pipeline: Established a streamlined inference pipeline capable of processing raw protein sequences to generate precise 3D structural models, facilitating practical applications in computational biology

Project Image

Industrial Anomaly Detection with Advanced Deep Learning

Variational Autoencoder for Anomaly Detection: Implemented the Variational Autoencoder (VAE) using Python and PyTorch to reconstruct high-dimensional data and identify anomalies through reconstruction errors.

Feature Extraction and Classification: Utilized ResNet-based feature extraction combined with K-Nearest Neighbors (KNN) for accurate anomaly classification, leveraging pre-trained networks for better feature representation.

Anomaly Localization with PatchCore: Applied the PatchCore method for effective anomaly localization, identifying defective regions within industrial images to support manufacturing quality control.

Project Image

Chicken Disease Classification

Deep Learning-Based Chicken Disease Classification: The project uses deep learning techniques to preprocess image data and train a model to classify chicken diseases. It incorporates a pipeline for data preprocessing and model training with version control managed via DVC.

CI/CD Integration and Containerization: GitHub Actions automates continuous integration and deployment, using Docker for containerization. The deployment workflows ensure seamless building, testing, and running of the application in cloud environments.

Cloud Deployment on AWS and Azure: The project supports deployment on AWS EC2 and Azure Web App. AWS deployment involves utilizing ECR for Docker image storage and EC2 for container execution, while Azure deployment leverages the Azure Container Registry and Web App Server for hosting the application.

 

Project Image

Student Score Prediction

Exploratory Data Analysis (EDA): The project analyzes math score distributions across demographics, identifies correlations with socio-economic factors, and detects patterns and outliers to inform feature selection.

Data Transformation Pipeline: A preprocessing pipeline handles missing values, scales numerical data, and encodes categorical features using ColumnTrans former, ensuring high-quality and consistent input for machine learning.

Model Training and Deployment: Models are trained and optimized via Grid Search, evaluated with MAE and RMSE, and prepared for redeployment on AWS Elastic Beanstalk

Project Image

Advanced RAG Agent

Advanced PDF Information Retrieval with Semantic Understanding: Implemented a robust pipeline for extracting and preprocessing content from PDFs, utilizing Optical Character Recognition (OCR) for scanned documents and embedding-based methods to semantically structure data. This ensures accurate alignment with large language models (LLMs) for advanced query answering.

Contextual Compression for Optimal Data Ingestion: Designed and integrated contextual summarization techniques, leveraging transformer-based architectures to reduce irrelevant data and noise while retaining critical semantic context. This preprocessing step enhances the efficiency and relevance of inputs fed into LLMs.

Enhanced Query Response via Adaptive Data Preprocessing: Combined hierarchical embeddings with fine-tuned LLMs to dynamically extract and rank document segments based on user query intent, enabling precise, context-aware answers directly from large PDF datasets.

Project Image

House Price Predictor - Advanced

In-Depth Exploratory Data Analysis (EDA): Conducted thorough EDA to uncover data patterns and relationships, forming the foundation for effective feature engineering and model development.

Advanced Feature Engineering and Model Validation: Developed meaningful features and implemented a robust model with rigorous testing and validation, ensuring high predictive performance.

MLOps Integration with ZenML and MLflow: Utilized ZenML and MLflow for seamless experiment tracking and deployment, exemplifying proficiency in modern MLOps practices.

Project Image

Protein Structure Prediction with Custom AlphaFold

Custom Implementation of AlphaFold Architecture: Recreated the AlphaFold model using Python and PyTorch, focusing on the intricate neural network layers and operations essential for accurate protein structure prediction.

Integration of Multiple Sequence Alignments and Structural Templates: Utilized multiple sequence alignments and structural templates to enhance the accuracy of protein folding predictions, effectively capturing evolutionary relationships and structural constraints.

Deployment of End-to-End Inference Pipeline: Established a streamlined inference pipeline capable of processing raw protein sequences to generate precise 3D structural models, facilitating practical applications in computational biology

Project Image

Industrial Anomaly Detection with Advanced Deep Learning

Variational Autoencoder for Anomaly Detection: Implemented the Variational Autoencoder (VAE) using Python and PyTorch to reconstruct high-dimensional data and identify anomalies through reconstruction errors.

Feature Extraction and Classification: Utilized ResNet-based feature extraction combined with K-Nearest Neighbors (KNN) for accurate anomaly classification, leveraging pre-trained networks for better feature representation.

Anomaly Localization with PatchCore: Applied the PatchCore method for effective anomaly localization, identifying defective regions within industrial images to support manufacturing quality control.

Testimonials

See What My Client's Are Saying

Aayon's interactive Tableau dashboards were a game-changer for our sales and marketing teams. His ability to visualize real-time customer behavior empowered data-driven decision-making, leading to a 12% boost in cross-selling opportunities. Aayon truly excels at bridging the gap between data and actionable insights

Puneet Gupta

Vice President of Sales at Keen & Able

Aayon is an exceptional data scientist who combines technical prowess with strategic insight. At Keen & Able, his predictive analytics project reduced customer churn by 18%, showcasing his ability to drive impactful solutions across departments. His expertise in Python and Scikit-learn is unmatched, making him a true asset to any team.

Nalini Nautial Sharma

Sr. Director (IT) at NIC

Aayon's project on chicken disease classification is a testament to his creativity and technical expertise. Using TensorFlow and CI/CD pipelines, he delivered a scalable and reliable deep learning solution. His dedication to deploying impactful models sets him apart as a forward-thinking data scientist.

Vijay Sethi

Chairman and Chief Mentor at MentorKart

Aayon's collaborative spirit and problem-solving skills made a tremendous impact at Keen & Able. His implementation of A/B testing frameworks improved customer engagement by 15% and drove an 8% revenue increase. His ability to work across teams is truly inspiring.

Bidyut Baran Mukherjee

CIO at Keen & Able

Working with Aayon was a transformative experience. His collaboration with data engineers to build and optimize ETL pipelines significantly improved processing times by 35%. His innovative approach and mastery of SQL, Airflow, and AWS turned complex data challenges into seamless solutions.

Varad Gupta

CTO at Keen & Able

Recommended Research Paper's

Diffusion Modeling

Denoising Diffusion Probabilistic Models (DDPMs) leverage Markov chains to reverse Gaussian noise, achieving high-quality generative tasks. Latent Diffusion Models (LDMs) optimize this by operating in compressed latent spaces, reducing computational costs while maintaining fidelity. These approaches enable versatile applications like image synthesis, super-resolution, and text-to-image generation​

Llama 3 Advancements

Llama 3 introduces large-scale transformer models (up to 405B parameters) for multilingual and multimodal AI tasks. Enhanced with rigorous data preprocessing, scaling, and safety measures, it matches state-of-the-art benchmarks for language and reasoning capabilities. Ongoing multimodal integrations (image, video, speech) expand the foundational scope of AI applications​

Generative Scaling

Both diffusion models and Llama 3 emphasize scalability in terms of data, architecture, and computational resources. Innovations include latent space optimizations in diffusion models and advanced scaling laws in Llama 3 to balance model size and compute efficiency

THE DATA-VERSE

Enter the World of Interactive Space

Embark on an intergalactic journey where you are the captain of a data exploration rocket, navigating through the mysterious realms of the DataVerse. Each planet holds secrets of data science waiting to be uncovered—patterns, clusters, algorithms, and more! Collect knowledge, solve challenges, and unlock new galaxies as you hone your data science skills.

  • Use W, A, S, D to move your rocket through space.
  • Adjust direction with arrow keys to explore new planets.

Skills

Programming Languages

Python, R, SQL, C/C++

Machine Learning & Deep Learning

CNN, RNN, Clustering, XGBoost, Graph Learning, Explainable AI (XAI), Forecasting, Image Processing, NLP

Data Science Tools

Tableau, PowerBI, D3.js, Matplotlib, T-tests, ANOVA, Regression Analysis

Cloud And DevOps

AWS, Azure, Docker, Kubernetes, Apache Airflow, CI/CD, Git

Latest Articles

AI and Data Science: An Uncharted Journey Through Innovation, Ethics, and Unexpected Consequences

As the team reviews case studies from industries across the globe, they realize AI isn’t just another tool; it’s a transformative force. Read More

Exploring the Power of Diffusion and Language Models: The Cutting-Edge Synthesis and Few-Shot…

In the rapidly evolving field of artificial intelligence, innovative model architectures are continuously pushing the boundaries of what machines can achieve. Two critical areas shaping the landsca... Read More

Emerging Data Science Trends: A Comprehensive Overview

Data science continues to revolutionize industries by unlocking insights hidden within vast amounts of data. As we progress through 2023, several trends are shaping the future of data science. This... Read More

Contact Me

    Download The
    Resume