currently our implementation (hash_corpus() method) deals with different nesting levels of a corpus in a recursive way. this is not ideal, because to loop over a nested list and alter it we must deepcopy it, increasing our memory footprint.
one possible solution would be to lazy read the input (from a prepared .json file?)