Bounding Box
2D rectangles around objects of interest — the workhorse of detection training. Sounds simple. Done correctly at volume, it is not.
// Details
- Object detection (YOLO, DETR, RT-DETR)
- Surveillance, retail analytics, autonomous driving
- Two-pass QA with tightness IoU sampling against gold set
- Class-balance reports per delivery
// Output formats
Polygon & Instance Segmentation
Tight, class-aware masks for irregular shapes. Per-instance even when objects of the same class overlap.
// Details
- Instance segmentation (Mask R-CNN, Mask2Former)
- Medical imaging, agriculture, defect inspection
- Boundary precision ≤ 2px on long edges
- Inter-annotator IoU ≥ 0.85 on convergence
// Output formats
Semantic Segmentation
Per-pixel class maps. Dense, label-everything output for scene understanding tasks.
// Details
- Autonomous driving (Cityscapes-style labels)
- Satellite / aerial imagery
- Medical segmentation (organs, lesions)
- CVAT with SAM-assisted brush, pixel-level QA overlays
// Output formats
Keypoint & Landmark
Pose estimation, facial landmarks, skeletal annotation, and per-class custom keypoint schemes.
// Details
- Pose estimation (sports, fitness, rehab)
- Facial landmarks for AR / animation
- Body 17 / 25 keypoint conventions
- Facial 68 / 98 / 106 landmark schemes
// Output formats
LiDAR / 3D Annotation
Point-cloud labeling, 3D cuboid boxes, and sensor-fusion datasets for AV and robotics.
// Details
- Autonomous driving (LiDAR-only or fusion)
- Robotics, drone perception
- 3D cuboids with heading & velocity
- Multi-sweep tracking IDs, image-LiDAR fusion projection
// Output formats
NLP / Text Labeling
NER, intent classification, span labeling, RLHF preference data — annotation that goes beyond simple category labels.
// Details
- Named entity recognition (NER)
- Intent & sentiment classification
- Preference pairs for RLHF / DPO
- Span annotation for SQuAD-style reading comprehension
// Output formats
Audio Transcription
ASR-grade transcription, speaker diarization, intent tagging for conversational AI.
// Details
- ASR training transcription at ≤ 5% WER
- Speaker diarization across multi-party calls
- Intent / entity tagging on utterances
- Language support: English, Hindi, regional Indian languages
// Output formats
Dataset Curation & Audit
Label audits, class-balance reports, dataset cleaning, and edge-case mining. The unglamorous work that changes model performance.
// Details
- Full-dataset label audits with error categorization
- Class-balance and distribution analysis
- Edge-case mining with active learning support
- Consistency checks across label rounds