It seems to occupy a lot of ram when the chunk radius is high (need to confirm).
If the sentence above is true the diffusion could be optimized by basing the BFS algorithm into the stack (https://www.ibm.com/developerworks/aix/library/au-aix-stack-tree-traversal/index.html)