If you capture a video and SLAM map of the whole space, you could use some VQA model like cosmos reason offline to extract key points and descriptions. Maybe even plan the route offline for the open ended task like “clean kitchen”. Then load the route and all you need is localization and obstacle avoidance
Aaah interesting, does stuff like this generalise to furniture moving around and different lighting conditions and stuff? Also sounds like if the route gets blocked it just wont move
Admittedly, my use of CUDA and Metal is fairly surface-level. But I have had great success using LLMs to convert whole gaussian splatting CUDA codebases to Metal. It's not ideal for maintainability and not 1:1, but if CUDA was a moat for NVIDIA, I believe LLMs have dealt a blow to it.
You can convert CUDA codebases to Vulkan and DirectX code, for all the good it does you. You're still constrained by the architecture of the GPU, and Apple Silicon GPUs pre-M5 are all raster-optimized. The hardware is the moat.
Apple technically hasn't supported the professional GPGPU workflow for over a decade. macOS doesn't support CUDA anymore, Apple abandoned OpenCL on all of their platforms and Metal is a bare-minimum effort equivalent to what Windows, Android and Linux get for free. Dedicated matmul hardware is what Apple should have added to the M1 instead of wasting silicon on sluggish, rinky-dink NPUs. The M5 is a day late and a dollar short.
To do gaussian splatting anywhere near in real time, you need good depth data to initialize the gaussian positions. This can of course come from monocular depth but then you are back to monocular depth vs lidar.
reply