Sneha Roy


  • 9 years of data science, advanced analytics, and engineering experience at McKinsey, Humana, Bank of America, and other global firms
    • 6 years focusing on scalable predictive modeling, experimentation, and user behavior analytics
    • 3 years focusing on building data architectures for Enterprise-level solutions
  • Led multiple mid to large projects while serving as a data scientist and engineer as well as liaison for cross-functional collaboration; comfortable interacting with senior leadership and business stakeholders
  • 4.0 GPA in Statistical Modelling and Big Data Analytics Master’s degree
  • Microsoft Certified Azure Data Scientist


Statistical & Probabilistic Modelling                             Deep & Reinforcement Learning                         Natural Language Processing Time Series Modelling                                 Big Data Processing Frameworks                        Data Analysis & Visualization    

Coding: Python, Spark, R, PySpark, Hadoop, C++, JavaScript, PHP, HTML, CSS, SQL, Bash Scripting

Tools: AzureML, MLflow, Databricks, GCP, Jupyter,Matplotlib, Bokeh, Natural Language Toolkit (NLTK),NumPy, Pandas, scikit-learn, SciPy, H2O, TensorFlow, PyTorch, Flask, SHAP, RStudio, ggplot, Shiny, Git, Hive, MapReduce, Spark SQL, PostgreSQL, BigQuery, MS SQL Server, Tableau, Power BI, KNIME, Splunk, Dynatrace, App Insights


McKinsey & Company   Lead Data ScientistBoston, USA Jul 2022 – Present
  • Built a linguistic model to generate real-time insights from free-text user interactions about firm technology; this provided Enterprise technology teams with a self-serve solution to understand their users and personalize their solutions and experiences; the self-serve model has been adopted by 5 product teams and more are being onboarded
  • Built an attribution model to quantify performance drivers and detractors of 30+ journeys; built a complimentary model to predict user satisfaction over time; this helped leaders make data-driven decisions about resource allocation and effort prioritization leading to a 4-percentage point improvement in overall user satisfaction
Humana Insurance Company   Senior Data ScientistBoston, USA Aug 2019 – Jul 2022
  • Built a multi-output predictive model to show first-time site visitors an estimate on immediate and annual costs using minimal inputs; this transparency resulted in an 18% increase in session to enrollment conversion rate for Humana across the US
  • Classified interactions between users and their virtual care assistants, identified proactive interactions, and isolated reasons for escalation from interactions using natural language processing and classification using algorithms like decision trees, nearest neighbors, and neural networks
  • Leveraged advanced statistical techniques to successfully create an experiment design with critical need to remove bias from test and control group split due to a high chance of interference
  • Worked closely with engineers to get data models into DevOps pipelines so that any updates to the model could be directly reflected in the application resulting in uninterrupted service despite continuous model tweaks
  • Partnered with UX designers in conducting user interviews, synthesizing research results, and leveraging learnings to iterate on the data science powered application feature increasing the application’s overall adoption rate by 60%
  • Adopted agile for data science by working with product managers to write stories and acceptance criteria for better stakeholder visibility into data science efforts as well as to keep data science efforts aligned with product portfolio goals
  • Created and lead the execution of an interdisciplinary workshop which resulted in a better understanding of the data science practice by other disciplines and vice versa, ultimately leading to more efficient collaboration
Bank of America   Lead Engineer, Analyst    Chennai, India Oct 2015 – Mar 2018
  • Developed regression-based tools to generate large volumes of test user accounts modeled after production data for user action simulation in lower environments that unlocked potential to test integration with third-party applications
  • Performed principal component analysis (PCA) and built classification models using support vector machines (SVM) in linear and Gaussian kernels, decision tree, and random forest to auto-select landing page for users post login
  • Partnered with performance engineers to develop predictive time-series models to analyze live applications’ performance and to auto-detect 40% of anomalies in near real-time
  • Created dynamic dashboards that enabled real-time monitoring of server performance KPIs and provided customized metric sets for audits; this improved reliability and reduced manual effort by up to 70%
  • Developed a text mining and classification algorithm to extract specific information and add relevant tags to historical project documentation, unlocking the ability to automate search
  • Visualized findings in Tableau; presented results and recommendations to business stakeholders; mined useful insights
Mindtree Limited Senior EngineerBangalore, India Jan 2015 – Oct 2015
  • Collected business requirements and translated to data requirements to create analytical, predictive, and prescriptive tools
  • Processed, cleansed, and verified data integrity and performed feature engineering to standardize data for predictive modeling
  • Created time series models to predict user loads on all business-critical workflows
  • Ramped up on latest business intelligence and data mining tools; subsequently delivered multiple training sessions across teams
Tata Consultancy Services Systems EngineerBangalore, India         Jul 2011 – Dec 2014
  • Worked with global banking and healthcare clients and engaged in several projects for application performance analysis
  • Cleaned and processed datasets; performed exploratory data analysis to design model algorithm
  • Developed and implemented a tool to cluster application users by a combination of different demographic factors
  • Analyzed results, visualized and presented outcomes, ideas, and recommendations for improvement to business stakeholders


Northeastern University Masters in Analytics (GPA 4.0/4.0)Boston, United States Apr 2018 – June 2019
Relevant Courses – Predictive Analytics, Data-Driven Decision Making, Information Security Governance, Information Systems Development, Enterprise Analytics, Risk Management, Data Mining Applications, Data Management and Big Data  
PESIT (Autonomous under VTU) Bachelor of Engineering in Biotechnology (GPA 7.5/10.0)Bangalore, India        Sep 2007 – Jun 2011
Relevant Courses – Python, ASP .Net Development, Web Services, Genetic Engineering, Bioinformatics


Distinguished performer (1 annual award at Humana, 2 annual awards at Bank of America, 1 quarterly client award at Mindtree, 2 annual client awards at TCS), Coding (1st Prize for C Programming at a National Tech Fest), Creative gymnastics (National level medalist), Several running & biking marathons, Bank of America GetActive Community, Loyola Honor Society, Over 500 volunteer hours at global non-profits

%d bloggers like this: