Core Data Science Competencies
- Exploratory data analysis (EDA), data cleaning, missing-data strategies, outlier detection
- Feature engineering (scaling, encoding, transformations, variable selection)
- Model development: supervised & unsupervised ML (classification, regression, clustering)
- Model selection: algorithm comparison, hyperparameter tuning, validation strategies, overfitting control
- Model evaluation: accuracy, precision/recall, ROC-AUC, F1, confusion matrices, cross-validation
- Workflow design: replicable pipelines, version control, environment management
Analytical & Modelling
- Statistical modelling: GLM, GAM, mixed-effects, hierarchical Bayesian models
- Spatiotemporal modelling: species distribution models, spatial autocorrelation, forecasting
- Machine learning: Linear/Logistic Regression, Random Forests, Gradient Boosting, k-NN, Decision Trees, SVMs, clustering (K-Means); hyperparameter tuning; scikit-learn Pipelines
- Deep learning: foundational neural networks using TensorFlow/Keras (feedforward models)
- Dimensionality reduction: PCA (ordination, variance structure, visualisation)
- Bayesian inference: hierarchical & spatiotemporal models, detection–abundance separation
- Forecasting & simulation: demographic forecasting, Monte Carlo, scenario modelling
- Model interpretation: feature importance, partial dependence, SHAP
- Probability modelling: Monte Carlo simulation, hypergeometric frameworks
- Introductory recommender systems: collaborative filtering and similarity metrics
- Introductory NLP: text cleaning, tokenization, vectorization (CountVectorizer/TF-IDF), Naive Bayes classification
Technical Stack
Languages & Libraries
- Python: pandas, NumPy, scikit-learn, matplotlib, Seaborn, Plotly; regex; logging; OOP
- R: tidyverse (dplyr/tidyr), ggplot2, sf/terra (spatial analysis), Shiny
- SQL: PostgreSQL (SELECT, JOIN, aggregation, subqueries, CTEs, window functions); PgAdmin
- Bayesian modelling: JAGS; familiarity with rstan-style workflows
Data Engineering & I/O
- CSV/Excel/Text, web scraping (BeautifulSoup/lxml), PDF parsing basics
- Data integration: joining complex relational datasets, API + SQL workflows
Software Engineering
- Git/GitHub (branching, PRs), reproducible environments (venv/conda, renv)
- Code quality: pytest, assertions, type hints, docstrings
- Modular code design and packaging fundamentals
Databases & Storage
- PostgreSQL database setup and querying
- Microsoft Access (legacy support)
Visualisation & Communication
- Exploratory and diagnostic plots in Python and R
- Model evaluation and interpretability visualisation
- Interactive dashboards (Shiny, Plotly)
Geospatial & Remote Sensing
- Raster/vector workflows, spatial joins, canopy/landscape metrics
- Spatial modelling and geoprocessing pipelines
- Parallelised spatial workflows
Applied Expertise
- Pattern detection in high-dimensional datasets
- Forecasting for risk assessment and scenario exploration
- Decision-support tools for environmental managers and policy audiences
- High-performance geoprocessing and workflow optimisation
- Translating analytical outputs into actionable insights
Communication
- Technical writing: peer-reviewed publications, reports, documentation
- Scientific communication: presentations, workshops, stakeholder briefings
- Skilled at adjusting explanations for technical and non-technical audiences