What is the Wayback Archiver?
The Wayback Archiver is a free tool that helps researchers, journalists, and writers preserve web pages by archiving them to the Internet Archive's Wayback Machine. Submit your URLs, verify your email, and receive your archive links via email - no waiting at your computer required!
Success rate: Typically 50-70% depending on content type. The tool works best with academic papers, government sites, and independent blogs. Major news sites and social media often block automated archiving.
đ Getting Started
How do I use this tool?
- Enter your email address - You'll need this to receive your results
- Paste URLs (one per line) or upload a text/CSV file
- Adjust settings if needed (snapshot age)
- Click "Submit for Archiving"
- Check your email for a verification link
- Click the verification link to start processing
- Wait 5-15 minutes - Results will be emailed to you with a CSV file
đĄ No need to wait! Close the browser after submitting. We'll email your results when ready.
What file formats can I upload?
You can upload
.txt files (one URL per line) or .csv files (URLs in the first column). The tool will automatically detect and process the URLs.
Is there a limit to how many URLs I can archive?
Yes, for performance and to prevent abuse:
- 100 URLs maximum per submission
- 3 submissions maximum per email address per hour
â ī¸ Rate Limiting: The tool automatically waits 1 second between creating new snapshots to be respectful of the Wayback Machine's resources.
Why do I need to verify my email?
Email verification helps us:
- Prevent spam and abuse of the service
- Ensure results go to the right person
- Protect the Internet Archive from misuse
đĄ Your privacy matters: We only use your email to send verification and results. No marketing, no selling, no storing.
âī¸ How It Works
What does "Maximum Snapshot Age" mean?
This setting (in days) determines how old an existing Wayback Machine snapshot can be before the tool creates a new one:
- 7 days (default): Uses existing snapshots up to a week old, creates new ones for older or missing snapshots
- 0 days: Always creates brand new snapshots (use for time-sensitive content)
- 30+ days: More forgiving, good for stable content that doesn't change often
đĄ Tip: Using the default 7 days balances freshness with being respectful of Archive resources. Only use 0 days when you specifically need the absolute latest version.
How long does processing take?
Processing happens in the background after you verify your email.
Processing time: 10-60 minutes depending on:
- Batch size (more URLs = longer)
- How many URLs fail and need retries
- Site complexity (slow sites take longer)
- Small batches (1-20 URLs): 10-15 minutes
- Medium batches (50-100 URLs): 30-60 minutes, sometimes more if many sites are blocking or slow
đĄ Why this matters: The Internet Archive is a non-profit providing a free public service. Being patient helps ensure it remains available for everyone.
What's the difference between "Found Existing" and "Created New"?
- â Found: The tool found a recent snapshot (within your max age setting) and reused it. This is faster and more efficient.
- + Created: No recent snapshot existed, so the tool created a brand new archive. This takes a bit longer.
- â Failed: The URL couldn't be archived (see troubleshooting below).
Does the tool remove duplicate URLs?
Yes! The tool automatically removes duplicate URLs from your input before processing. You'll see a message showing how many duplicates were removed.
đ§ Troubleshooting
What is a typical success rate?
Success rates vary by content type:
- Academic papers & government sites: 70-90% success rate
- Independent blogs & news sites: 60-80% success rate
- Major commercial news sites: 30-50% success rate (many block automated archiving)
- Overall typical range: 50-70% success rate
đĄ The tool automatically retries failed URLs twice (waiting 60-120 seconds between attempts) to improve success rates for temporary failures.
Understanding Error Codes in Your Results
Your CSV includes an
error_reason column explaining why each URL failed:
Common Errors & Solutions:
Rate limited (429)
What it means: Site detected automated requests and blocked them
Common on: Major news sites (ABC News, CNN, NYTimes)
Solution: Try archiving manually at web.archive.org
What it means: Site detected automated requests and blocked them
Common on: Major news sites (ABC News, CNN, NYTimes)
Solution: Try archiving manually at web.archive.org
Server error (500, 502, 503, 523)
What it means: Website server issues or blocking archive attempts
Common on: Sites with Cloudflare protection
Solution: Try again later (often temporary), or archive manually
What it means: Website server issues or blocking archive attempts
Common on: Sites with Cloudflare protection
Solution: Try again later (often temporary), or archive manually
Site blocks Wayback Machine (403)
What it means: Website's robots.txt explicitly blocks archiving
Common on: Sites that don't want to be archived
Solution: Cannot be archived (respecting robots.txt is required)
What it means: Website's robots.txt explicitly blocks archiving
Common on: Sites that don't want to be archived
Solution: Cannot be archived (respecting robots.txt is required)
Request timed out
What it means: Page took longer than 90 seconds to archive
Common on: Complex pages with lots of JavaScript/media
Solution: Try again (sometimes works on retry), or try at off-peak hours
What it means: Page took longer than 90 seconds to archive
Common on: Complex pages with lots of JavaScript/media
Solution: Try again (sometimes works on retry), or try at off-peak hours
Page not found (404)
What it means: URL doesn't exist or was deleted
Solution: Check if URL is correct
What it means: URL doesn't exist or was deleted
Solution: Check if URL is correct
Connection error
What it means: Network issue or site is down
Solution: Try again later
What it means: Network issue or site is down
Solution: Try again later
đĄ Check the
retry_count column: Shows how many times we attempted archiving (0 = first try, 2 = tried 3 times total)
Which types of sites work best?
â Usually work well:
- Academic papers and journals
- Government websites (.gov, .edu)
- Independent news sites and blogs
- Wikipedia and similar reference sites
- GitHub repositories and documentation
- Personal websites and portfolios
- Major commercial news sites (NYTimes, WSJ, ABC News)
- Social media platforms (Twitter, Facebook, Instagram)
- Sites with aggressive bot detection
- Paywalled content
- Sites that explicitly block archiving (robots.txt)
- Pages requiring login
What should I do with failed URLs?
- Check the error reason in your CSV to understand why it failed
- For temporary errors (timeouts, server errors): Try submitting again later
- For rate limiting (429): Try archiving manually at web.archive.org
- For blocked sites (403): These cannot be archived via Wayback Machine
- Consider alternatives: Screenshots, PDF saves, or citation tools
đĄ Manual archiving: Visit web.archive.org and paste the URL in the "Save Page Now" box. This sometimes works when automated tools fail.
Why did some URLs fail?
URLs can fail for several reasons:
- robots.txt blocking: The website blocks the Wayback Machine from archiving
- Paywall or login required: The Archive can't access content behind authentication
- Temporary network issue: The site was down when archiving was attempted
- Invalid URL: The URL format wasn't correct
đĄ Try this: Wait a few minutes and try the failed URLs again. Sometimes the issue is temporary.
I didn't receive the verification email. What should I do?
Try these steps:
- Check your spam/junk folder - verification emails sometimes end up there
- Wait a few minutes - email delivery can take 1-2 minutes
- Check the email address - make sure you entered it correctly
- Try submitting again - the first attempt may have failed
đĄ Add to contacts: Adding contact@theopenrecord.org to your contacts helps prevent emails from going to spam.
I didn't receive the results email. What should I do?
- Check spam/junk folder first
- Wait longer - large batches or complex pages can take 20+ minutes
- Check if you clicked the verification link - processing doesn't start until you verify
- Try with fewer URLs - if you submitted 100 URLs, try 10 first to test
Can I archive pages that require login?
No. The Wayback Machine can only archive publicly accessible pages. Content behind logins, paywalls, or password protection cannot be archived using this tool.
đ Privacy & Data
Does this tool store my URLs or data?
We store minimal data temporarily:
- Your email & URLs: Stored temporarily (up to 24 hours) to process your request
- After processing: All data is automatically deleted
- No permanent storage: We don't keep logs, databases, or records of your submissions
đĄ Your privacy: We only use your email to send verification and results. No marketing lists, no third-party sharing, ever.
Are the archived pages public?
Yes. When you archive a page to the Wayback Machine, it becomes part of the Internet Archive's public collection. Anyone can access archived snapshots.
â ī¸ Important: Don't archive pages with sensitive, private, or confidential information. Archives are permanent and public.
⨠Best Practices
When should I use different "Maximum Snapshot Age" settings?
- 0 days: Breaking news, social media posts, or content that changes hourly/daily
- 1-7 days: News articles, blog posts, frequently updated content
- 7-30 days: Most general web pages (default is good here)
- 30+ days: Academic papers, historical documents, stable reference material
How should I organize my URLs for archiving?
đĄ Best practice:
- Group related URLs together (e.g., all sources for one article)
- Create a text file with descriptive filename (e.g., "sources_climate_article_2024.txt")
- One URL per line, no extra formatting needed
- Keep a local copy of your URL lists for future reference
Should I archive pages immediately or wait?
Archive immediately if:
- The page might be deleted or changed soon
- It's breaking news or time-sensitive content
- You're citing it in research or journalism
- It's evidence of something important
- It's stable reference material
- It's already been archived recently
- You're just bookmarking for later reading
đž Download & Export
How do I get my results?
Results are emailed to you automatically when processing is complete:
- Email subject: "Wayback Archive Results - X of Y URLs Archived"
- Email contains: Summary statistics and CSV file attachment
- CSV includes: Original URL, archive URL, timestamp, age, and status for each URL
đĄ Tip: You can open the CSV in Excel, Google Sheets, or any spreadsheet software for easy organization.
Can I use the archive links in my citations?
Absolutely! Wayback Machine links are widely accepted in academic, journalistic, and legal citations. They provide permanent references to web content as it existed at a specific point in time.
đĄ Citation format: Include both the original URL and the Wayback link with the archive date.
âšī¸ About
Who created this tool?
This tool was created by The Open Record to make web archiving more accessible for researchers, journalists, and anyone who wants to preserve online information.
Is this tool affiliated with the Internet Archive?
No. This is an independent tool that uses the Internet Archive's public API. We're grateful for their incredible public service and encourage you to support their mission if you find value in web archiving.
Is this tool free to use?
Yes! This tool is completely free. The Internet Archive's Wayback Machine is also free to use. We built this to make archiving easier for everyone.