waybackrobots

Collect old robots.txt files from waybackmachine and download the Disallow paths

Install:

$ go install github.com/ogow/waybackrobots@latest

Usage:

Just give the flag -domain a valid domain and it will start downloading all the archived responses from 2015

$ waybackrobots -domain google.com

Get all response from a custom year:

$ waybackrobots -domain google.com -fd 2020

Sometimes the wayback api can return alot of results which will take along time to download, to avoid this the -strat flag can be used. The -strat flag takes one of these values day, month, digest, digest is default. Usually digest will be the go to value, but if we look at a domain like google.com that has been archived alot the digest filter will still return alot of results. If this is the case we can try to use the day filter which gets one snapshot each day.

filters in use explanation: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#collapsing

comparing digest with day and month

$ go run . -domain google.com -strat digest
[i] found 38261 old robots.txt files

$ go run . -domain google.com -strat day 
[i] found 473 old robots.txt files

$ go run . -domain google.com -strat month
[i] found 122 old robots.txt files

Help:

Usage of waybackrobots:
  -domain string
        which domain to find old robots for
  -fd int
        choose date from when to get robots from format: 2015 (default 2015)
  -strat string
        interval to get robots for, possible values: digest, day, month (default "digest")

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
go.mod		go.mod
go.sum		go.sum
waybackrobots.go		waybackrobots.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

waybackrobots

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ogow/waybackrobots

Folders and files

Latest commit

History

Repository files navigation

waybackrobots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages