Actually I'm just using a summed luminance channel difference between consecutive frames, with noise reduction by ignoring per pixel differences under a specified limit. Very naive implementation but seems to work well enough. It's also very light on the cpu, ran fine even on Raspberry Pi.
Wouldn't it be better to do shape detection before comparing? That way clouds shouldn't matter.