Open to opportunities

Allen Chima
Data Scientist

MSc Data Science student at the University of Wolverhampton, building expertise in big data, geospatial analysis, machine learning, and interactive data visualisation.

Turning data into decisions

I'm Allen Chima, an MSc Data Science student based in Wolverhampton, UK. My work spans the full data science pipeline — from ingesting and cleaning raw datasets to building interactive dashboards and deploying machine learning models.

During my MSc I've worked with technologies like Apache Spark for distributed data processing, MongoDB for enterprise NoSQL design, GeoPandas for spatial analysis, and Plotly & Streamlit for interactive data applications.

I'm particularly interested in the intersection of data science and real-world impact — whether that's analysing global health trends, urban air quality, or energy site selection.

7 MSc Modules Completed
5+ Portfolio Projects
10+ Technologies Used
MSc Data Science (In Progress)

Skills & Tools

Languages
Python SQL Markdown
Data Science
pandas NumPy Scikit-Learn SciPy Jupyter
Big Data
Apache Spark PySpark Hadoop
Databases
MongoDB PyMongo NoSQL
Visualisation
Plotly Matplotlib Seaborn Streamlit
Geospatial
GeoPandas Folium QGIS
Dev Tools
Git GitHub VS Code

MSc Data Science Modules

Seven modules completed across two semesters, each with a dedicated GitHub repository containing full assessment submissions and documentation.

Selected Projects

Independent and assessed projects demonstrating applied data science skills across analysis, visualisation, machine learning, and dashboard development.

🌍
HIV ART Coverage Analysis
Interactive EDA of global HIV antiretroviral therapy coverage among adolescents (2010–2021). Features custom censored-value imputation, Choropleth world maps, Sankey diagrams, and treemaps built with Plotly.
Python Plotly EDA Public Health
🌫️
Beijing Air Quality Analysis
Multi-station air quality analysis across 12 PRSA monitoring stations in Beijing (2013–2017), tracking PM2.5, PM10, NO₂, and meteorological factors. Includes a fully interactive Streamlit dashboard.
Python Streamlit pandas Environmental
🗺️
Geospatial Wind Energy Assessment
Spatial analysis for wind energy site suitability assessment using GeoPandas and Folium. Applies multi-criteria spatial analysis techniques to identify optimal locations based on geographic and environmental constraints.
GeoPandas Folium Spatial Analysis GIS
📈
Data Science Portfolio
Collection of independent ML projects including gold price time-series prediction and a house price ML pipeline. Demonstrates regression modelling, feature engineering, and model evaluation with Scikit-Learn.
Scikit-Learn ML Time Series Regression
Big Data with Apache Spark
Large-scale data processing using Apache Spark and PySpark. Covers distributed computing fundamentals, RDDs, DataFrames, Spark SQL, and performance optimisation for big data workloads.
Apache Spark PySpark Distributed Big Data
🗄️
MongoDB Enterprise Database Design
NoSQL database design and implementation using MongoDB for enterprise-scale data management. Covers schema design, indexing strategies, aggregation pipelines, and enterprise data architecture patterns.
MongoDB PyMongo NoSQL Enterprise

Let's Connect

I'm open to data science opportunities, collaborations, and conversations. Feel free to reach out via LinkedIn or explore my work on GitHub.