The experiment (September 2016):
- A site exists since 2008 and has many snapshots on archive.org.
- The FAQ says:
You can exclude your site from display in the Wayback Machine by placing a robots.txt file on your web server that is set to disallow User-Agent: ia_archiver.
- Made the robots.txt as follows:
User-agent: ia_archiver
Disallow: /
- In a couple of days, indeed, instead of the site snapshots, a message saying that the archive is not available because of the robots.txt instruction. Success!
- Removed the robots.txt.
My expectations:
The old pages had disappeared from the archive completely, but the current one is now being crawled and included into the archive.
The reality:
All the old pages are back!
The bottom line:
Using the robots.txt, one can instruct Wayback Machine to stop displaying the site (which is exactly what the FAQ says!) but not to remove the pages from the archive.