AI & Data Pipeline Engineer · NYC

Building ML-driven data systems for financial services.

I'm Abir, an AI & Data Pipeline Engineer focused on credit risk modeling, explainable AI, and cloud-native ETL architecture.

Previously: Fixnox, HKexcel Education. MS Financial Technology from UConn.

Abir Shah

What I deliver

0.77
ROC AUC

Credit risk classifier on 10K synthetic records

23K+
Trade Logs Generated

Realistic FIX protocol messages for ETL testing

80%+
Student Achievement

Consistent scores across 70+ IB students

3
Cloud Pipelines

End-to-end AWS systems shipped to production

How I support engineering teams

  • Build end-to-end ML pipelines with explainable AI for regulatory transparency in financial services.
  • Design cloud-native ETL architectures on AWS following medallion patterns for progressive data refinement.
  • Deliver production-grade data systems with structured logging, validation, and error handling.

Professional experience

AutoShopIQ

AI & Data Pipeline Engineer

Sept 2025 – Present · Stamford, CT

  • Architected an end-to-end, event-driven data ingestion pipeline using AWS S3, Lambda, and IAM to capture, process, and store automotive repair documents.
  • Built Document AI workflows leveraging Amazon Textract, OCR tooling, and Python to extract and normalize unstructured repair data into canonical JSON schemas.
  • Designed AI-ready data pipelines integrating validation, transformation, and PostgreSQL storage for downstream ML and recommendation systems.
  • Improved pipeline reliability with structured logging, error handling, and retry mechanisms for real shop environments.

Fixnox

Data Engineer Intern

Jan 2024 – Dec 2024 · Sydney, Australia (Remote)

  • Architected a batch ETL pipeline on AWS (S3, Glue, Athena) to ingest and analyze FIX protocol trade messages following a medallion architecture.
  • Developed a Python-based FIX message data generator producing ~23,000 realistic trade logs partitioned in Hive-style format.
  • Engineered ETL transformations with PySpark and Pandas to parse raw CSV trade logs into optimized Parquet with derived trading metrics.

HKexcel Education

Physics & Mathematics Instructor

Nov 2019 – Jul 2023 · Hong Kong

  • Delivered instruction to 70+ IB Mathematics and Physics students, resulting in consistent scores above 80%.
  • Translated complex concepts into digestible modules, increasing student comprehension and achievement by 25%.

Featured work

Credit Risk Scoring Engine with Explainable AI

End-to-end credit risk classifier using XGBoost on 10,000 synthetic credit records with SHAP explainability for regulatory transparency.

  • ROC AUC of 0.77 on 2,000-record test set
  • SHAP waterfall plots, force plots, and beeswarm visualizations
  • Addresses ECOA, FCRA, GDPR Article 22 requirements
  • Interactive Streamlit dashboard with real-time scoring
PythonXGBoostSHAPStreamlitscikit-learn

Monte Carlo Portfolio Optimization

Simulated buy-and-hold vs rebalancing strategies (2017–2022), improving risk-return tradeoff through monthly rebalancing with S&P 500 data.

  • Analyzed stock investment strategies and risk-return tradeoffs
  • Monthly rebalancing improved portfolio performance
PythonNumPyPandasMatplotlibMonte Carlo

Predicting Problematic Internet Use

Classification model for adolescent internet risk using the HBN dataset from the Child Mind Institute.

  • Built on the Healthy Brain Network (HBN) dataset
  • F1 score of 0.73 for internet addiction risk prediction
Pythonscikit-learnDeep LearningPandas

Academic background

University of Connecticut

Aug 2023 – May 2025 · Storrs, CT

MS Financial Technology

Key Coursework

Deep LearningData MiningTime Series ForecastingPredictive ModellingFinancial Programming & Modelling

City University of Hong Kong

Sept 2015 – Oct 2019 · Hong Kong

BEng Mechatronic Engineering

Certifications

Associate Data Engineer in SQL

DataCamp

Python & Statistics for Financial Analysis

Coursera

Python for Data Science, AI & Development

Coursera

Multivariate Calculus for Machine Learning

Coursera

Technical stack

Tools and platforms I use most in data engineering and ML work.

Programming & Data

PythonRSQLMySQLPostgreSQLGitShell

Machine Learning & AI

scikit-learnXGBoostSHAPTensorFlowPyTorchNLPDeep Learning

Cloud & Infrastructure

AWS S3AWS GlueAWS LambdaEC2AthenaAirflowIAMTextract

Data & Visualization

PandasNumPyPySparkMatplotlibSeabornPower BITableauStreamlit

Financial Domain

Credit Risk ModelingMonte Carlo SimulationPortfolio OptimizationFIX ProtocolBloomberg Terminal

Let's connect

Open to opportunities in data engineering, ML engineering, and fintech. Feel free to reach out.