Computer Engennier
Henrique Rocha.
Building production-grade ML systems for agritech and industry. Computer Vision · MLOps · Data Engineering.
01 / 05
YOLO v5/v8 object detection and EfficientNet classification trained on agricultural datasets. ONNX Runtime inference for low-latency edge deployment. Research published at IMVIP.
Work
IMVIP — Agricultural Risk Platform
AI-powered platform for rural credit risk assessment using EfficientNet leaf disease classification and Sentinel-2 satellite scoring. Delivers an explainable AQR score (0–100) across 7 risk components for banks, cooperatives, and insurers.
Leaf Disease Classifier
Coffee leaf disease classifier trained on BRACOL + RoCoLe datasets with EfficientNet-B2, exported to ONNX Runtime for low-latency edge inference. Research published as scientific article.
Satellite Risk Engine
Multi-layer risk pipeline combining NDVI, climate indices (ERA5/CHIRPS/SPI), SoilGrids data, and environmental compliance checks into a structured farm-level risk report.
Object Detection System
Production object detection pipeline using YOLO v8 and OpenCV for agricultural field monitoring, integrated with GCP and orchestrated via Apache Airflow for high-volume data processing.
ML for Agricultural Risk Assessment
Machine learning approach to rural credit risk scoring combining satellite-derived NDVI, climate indices (SPI/CHIRPS/ERA5), SoilGrids data, and computer vision disease detection into a structured farm-level risk report.
Download PDFWhat is AQR?
Annotation Quality RankingTrain on 25% of the data. Keep ≥97% of the mAP.
AQR is a two-stage data selection method for object detection. Instead of training on the full dataset, it ranks images by annotation quality before training — isolating the most informative 25% of samples. The core hypothesis: selecting the right data beats training on everything.
Pipeline
Removes images whose bounding-box density is incompatible with the test set. A Gaussian score centered on μ_test=1.91 eliminates ~34% of images, keeping only the distributional-compatible subset.
exp(−0.5 × ((n − μ) / σ)²) ≥ 0.50Within the compatible subset, images are ranked by richness (bbox count diversity, 60%) + precision (bbox centrality and area quality, 40%). The top quartile (Q4) is selected for training.
0.60 × richness + 0.40 × precisionQ4v5 (25% of data) reaches ≥97% of full-dataset mAP. Confirmed: VOC 2012 (97.5%) and CGIAR Wheat (97.9%) with YOLOv8s. Conditional on datasets with ≥2000 images per quartile.
AQR-v5 ranks training data consistently across domains. Confirmed 7/7 dataset×model combinations. ρ̄ = +0.867. Perfect monotonic correlation ρ = +1.000 on PlantDoc (p < 0.05).
Skills
AI & Computer Vision
Machine Learning, DL & LLMs
Data Engineering
MLOps & Cloud
Backend & Architecture
Monitoring & Methods
Highlights
About
AI & Data Engineer with hands-on experience in Computer Vision, MLOps, and end-to-end data pipelines in production across agribusiness and industrial sectors. Built an automated object detection system with YOLO/OpenCV on GCP and published research on identifying rust in coffee leaves. Solid engineering base in microservices, DDD, Docker, and CI/CD — combined with expertise in Python, Apache Airflow, DBT, and SQL.