Fixing DOAB Links
Keeping the links in DOAB working is a collaborative enterprise of the community. This site checks the links, but without participation from participating publishers, addressing the problems it exposes would be impossible. Information for DOAB publishers is available on the DOAB website.
Types of problems, and how to fix them
When a link is checked we record the status code and content type returned by the web server. This status code gives us hints about what sort of problem is present
- "214" indicates a unescaped redirect location.
- When a server redirects a link it is supposed to send a valid URL in the "location" header so that the web client knows where to go. URL's should have only ascii characters - non ascii characters are url-escaped using %XX to represent non-ascii bytes. We have found that some servers send location strings with characters like "ä" (%E4 or %C3%A4 when escaped) in the location header. Most web client software does a best guess of how to interpret the bytes; so the redirection usually succeeds. ("214" is not a standard HTTP code.)
- "301" or "302" indicates a bad redirect.
- Redirects are used to keep links working after they've changed addresses. CrossRef links, for example, are usually redirected to the publisher's website. But sometimes the redirecting server get it wrong, and there are problems with the resolutions. Another type of problem involves chains of redirects - there might be a loop, or there might be an insecure link in the middle of an other wise secure chain of links - That used to be OK, but now it's an error.
- "403" indicates a misconfigured server that is not allowing access to the promised resource.
- Sometimes this happens when a website is trying to authenticate users, even thought the resource is open access. Check to make sure your server is not trying to authenticate users for open access resources Or perhaps a link for a closed resource has been mistakenly loaded to DOAB.
- "404" means the link is broken - the resource is not found.
- The common causes for this error are:
- The url loaded to DOAB is incorrect. Maybe there's a typo, or perhaps an encoding error (URLs can contain accented characters, whitespace, or symbols, so they need to be escaped or encoded).
- A resource that used to be available has disappeared or has changed addresses.
- "418" means the website was pretending not to be there.
- This is also known as "I am a Teapot". We use this code to describe sites that try to block robots in the most annoying and self-defeating manner possible. Return 503 instead.
- "500" means something has gone wrong at the website server.
- you have a server problem.
- "502" is a gateway error.
- Some websites use load balancers or content distribution networks; if these gateways have a problem connecting with the source website, they send a 502 response. The server needs checking.
- "503" means that a website couldn’t be reached.
- This could happen because the server was too busy, under maintenance, or something else. Amazon's robot blocker returns 503 codes, so these must be checked manually. It may be that your server is blocking users based on the user-agent sent with the request. To make sure that DOAB Check doesn't get blocked, add "doab_check_bot" to your server's allow list.
- "504" indicates that the server, while acting as a gateway or proxy did not get a response in time from an upstream server.
- Some web servers run in protected environments and only talk to the internet via a gateway or a proxy. This means that there is a problem with the communication between the web server and a server in-between.
- "511" indicates a problem with the security of the connection - most often an incomplete certificate.
- The SSL Server Test can help you diagnose this problem. But beware - current we browsers often ignore some security problems, so the link might work when you try to test it. But because the browser vendors are gradually clamping down on weak security, the next update to Firefox or Chrome might start issuing warnings that you website is unsafe. Better to fix it now.
- "524" means the website didn't respond in a reasonable time.
- This might be an intermittent problem. Links get checked every month, so look at the the history of checks to see if that's so. Otherwise, you probably have a server problem.
- "None" or "0" means something has gone terribly wrong. Possibly a bug in the checker or a malformed url.
- If your server admins can't find the problem, have them contact us!
- "200" means that the link operated correctly.
- Unfortunately, this DOESN'T necessarily mean that the link is going to the right place. Checking THAT is beyond the scope of this project at present.