I found some codes to crawl and clean data: For Crawling with a specific keyword: https://gist.github.com/vickyqian/f70e9ab3910c7c290d9d715491cde44c For pre-processing the data: https://github.com/CrisisNLP/deep-learning-for-big-crisis-data/blob/master/data_helpers/preprocess.py