Home service robots are increasingly deployed in unstructured, dynamic environments, where they must detect and interact with a wide variety of objects without prior training on specific classes. Traditional object detection methods rely on large-scale annotated datasets and often require cloud-based processing, which is impractical due to the diversity of household objects, evolving user environments, and privacy concerns. This thesis introduces HybridNAV, a hybrid framework that integrates zero-shot object detection with adaptive exploration for mobile robots, emphasizing privacy-preserving, real-time operation. HybridNAV combines the semantic generalization of CLIPSeg with the spatial precision of Grounding DINO and SAMv2, employing a density-aware spatial-confidence fusion strategy to enhance detection accuracy. The system was evaluated both in simulation (ScanNet dataset) and on a real-world Stretch 2 robot, validating its performance in diverse, unstructured environments. Despite achieving robust zero-shot detection and adaptive navigation, the system faces challenges related to occlusion sensitivity and computational overhead. Future work will explore model optimization and multi-view fusion techniques to address these limitations and further enhance deployment viability on resource-constrained platforms.
Use Your Cell Phone as a Document Camera in Zoom
From Computer
Log in and start your Zoom session with participants

From Phone
To use your cell phone as a makeshift document camera