Benutzer:Frog23/Dead Link Finder/en

aus Wikipedia, der freien Enzyklopädie

Flag of Germany.svg/Flag of Austria.svg/Flag of Switzerland within 2to3.svg Um diese Dokumentation auf deutsch zu sehen, gehe auf Benutzer:Frog23/Dead Link Finder/de.

The German documentation to the Greasemonkey version (previous version) of this script can be found at Benutzer:Frog23/Dead Link Finder (Greasemonkey Script).

Basic Idea

Datei:WP-Dead-Link-Finder-1.png
A dead link was marked

Dead Links to external websites are quite annoying. Usually they only are noticed, when somebody wants to use or check the external source, thus clicking on the link. To fix this problem, I have written a script which can automatically check all external links on the current Wiki-page and will indicate if any and which links are dead. This script is called DeadLinkFinder It can be assumed that one is currently looking into the topic of that article anyway, which lowers the inhibition threshold to fix a dead link. Fixing a link usually only takes a few minutes but is quite important for the quality of an article.

Installation

There are 3 important notes everybody should read before installing the DeadLinkFinder:

  • Beta-Test: The Dead-Link-Finder is currently in a beta phase. Usually it works quite well, but occasionally small errors or strange behavior can occur. However those are only errors in terms of displaying the current page. The script is not able to edit any article or cause errors in your browser. If you notice some strange behavior of the script, please report it on the discussion page of this article.
  • Browser-Compatibility: The script only works on modern browsers, which support HTML5, but not with the Internet Explorer.
  • Privacy: By default, the script only runs on demand, which means, the user has to click the link to have all external links on the current page checked. However, it is possible to change the settings, so the script will run on every page (for more information on how to make the script run automatically, see the section Modes of Operation). In both cases (but mainly when checking the links automatically) some privacy concerns may arise.
    To limit the traffic this script causes on other websites, each link is only checked once a day and the result is cached. The checking and caching is not done by the script itself, but by a tool called HeaderProxy which runs on the Wikimedia Tool Labs Server (more details see below). If the same link is checked twice within the a short period of time, the cached result is returned. All requests are stored with the requested link, the time, the wiki article the request was made from and the result. Those logs will be used to further improve the script and later to generate statistics. In the future a site is planned which will show the most recently found dead links. No individual related information are stored! However in the beginning, when there are only few beta testers, it might be possible to related individual links to specific persons. When the scripts settings have been changed to automatically run on ever page, the external links of every page, that is visited by the user, are send to the server. Because of that, it could be possible to see what pages the user has visited and based on that it could be possible to draw conclusions on the private life of that user. Just to point it out again: later on (when there are enough users so that it is not possible anymore to relate checked links to individual users) a list of all discovered dead links will be made available to the public. Only I, as the developer, have full access to all stored information and I will NOT use it to draw any conclusions related to any person from it. However I still wanted to point out the privacy concerns that could arise when using this script.

To add the script to your Wikipedia account, just add the following line to the JavaScript-page of your Skin:

mw.loader.load("//tools.wmflabs.org/deadlinkfinder/script.js");

Besides that, there are also several ways to adapt the script to your needs. For more information about that, see the section Settings.

Functionality

Modes of Operation

By default the script only runs on demand. This means to check all external links on a particular page, the user has to click the link "check links" which is displayed in the Toolbox menu.

Datei:WP-Dead-Link-Finder-7-Start-Links en.png
the links for starting the script

It is also possible to run the script automatically for every page the user visits. To have the DeadLinkFinder check the scripts automatically the user has to click the link "always check links", which is also displayed in the Toolbox menu. It is always possible to stop the automated checking of links by clicking on the link "stop checking links", which will appear in the Toolbox menu once the automated checking is enabled. The setting of this mode is stored in a cookie, therefor cookies need to be enabled to use this option. The cookie will expire after 30 days. For people who are clean out their cookies regularly or who do not want expiring cookies stop them from automatically checking all external links, there is also another option for always checking all external links. Just add the following line to the JavaScript page of your skin:

var deadLinkFinder_runAlways = true;

Both of the described ways to always check all external links are still subject to namespace filtering. This means by default only the links of pages in the default namespace are checked. In all other namespaces, the link for on demand checking is displayed. To change the settings for the namespace filter, see the section Namespace Filter

Checking if a link is dead

Datei:WP-Dead-Link-Finder-2.png
2 dead links have been found

After a Wikipedia-Page is loaded, the script is executed. It selects all external links within that article and checks every one of them. However, for security reasons (Same origin policy) JavaScript is restricted to only communicate with the server it was loaded from, but in HTML5 there is an exception to this rule: a script may communicate with a foreign server, if that server explicitly permits it (Cross-Origin Resource Sharing). Instead of communicating to the server of the external links directly, the script asks a special tool on the Wikimedia Tool Labs Server. This tool is called HeaderProxy and allows the access from all Wikimedia projects. It checks the link given by the script, since it can call the requested URL directly. It establishes a connection to the corresponding server and requests the page. Not the entire page is loaded, instead the connection is terminated after the HTTP-Header is received. The HeaderProxy checks the transmitted HTTP-Statuscode. Is this code 200 (OK), it means the link is valid. Is the code in the 300 range, it means it is a redirection and the Proxy will follow it, if a new link is given, and checks the new link. All other status codes the link is dead. The HeaderProxy sends the status code and its description back to the script. Additionally the Link, the according wiki page, the result and the date are stored in a data base on the Tool Labs Server. If the same link is checked again within the next 24 hours, the cached result will be returned, to avoid unnecessary traffic to the foreign server. Also the stored data will be used for improving the script and for later statistical evaluation.

Marking of dead links within the article

If there is a dead link, the script will indicate it by adding a little warning icon directly next to the link. Also the status code will be written next to it. The description of the status code can be seen by holding the mouse over the icon or the code. The descriptions are taken directly from the server, which means they can sometimes be different to the ones in the specification (caused by translations or different wordings).

If there are one or more dead links on a page, a big warning icon will be displayed in the bottom right corner of the browser window. Next to it the number of all already found links is displayed. If this number is red, not all links have been checked. After all links have been checked the number will turn black. By clicking on the warning icon, the browser jumps to the first, next to every other dead link, which has been found so far. By clicking on the small X next to the warning icon, the info box will be closed.

Settings

The script is designed, so it disturbs the user as little as possible in its normal interaction with Wikipedia. Only if dead links are found, it will be displayed. However there are several ways to adjust the script to your own personal needs.

Datei:WP-Dead-Link-Finder-6-Waiting-Icon.png
The Waiting Icon. The Script is still running but no dead links have been found so far.

Waiting Icon

In order to see, if the script is still checking links, there is the possibility to show a little waiting icon. Just add the following line to the JavaScript page of your skin:

var deadLinkFinder_showWaitingIcon = true;

OK Icon

Datei:WP-Dead-Link-Finder-3.png
Everything is OK. There are no dead links in this article.

Alternatively or additionally to the waiting icon, it is also possible to have the script indicated, when all links have been checked and no dead links were found. Just add the following line to the JavaScript page of your skin:

var deadLinkFinder_showOk = true;

By default the OK icon will be slowly faded out after 3 seconds. With the line:

var deadLinkFinder_fadeOk = true;

it remains visible until the user clicks on it to make it disappear.

Browse-Mode

Datei:WP-Dead-Link-Finder-5-Browsemode.png
The link to start the Browse mode (in German)

It is possible to have the script run automatically and as long as no dead links are found, it jumps to a random page which it checks then. It will stop, after the first dead link if found. This mode is called Browse-Mode and in order to use it, add the following line to the JavaScript page of your wiki-skin:

var deadLinkFinder_showBrowsemodeLink = true;

This will display the Link "start Browsemode" in the tools menu. After clicking the link, the browse mode will start and the link will turn into "stop Browsemode". With this link the Browse mode can be stopped even if no dead links have been found.

In order to use the Browse mode, cookies must be accepted.

Language

The few texts displayed by the DeadLinkFinder, can be shown in several languages. By default, the language of the wiki user interface is used. It is however possible to explicitly specify other language for the DeadLinkFinder. In order to use it, add the following line to the JavaScript page of your wiki-skin:

var deadLinkFinder_language = 'de';

(in this example, the default language is set to German. Change it according to your needs). If the requested language is not available, the language of the user interface is used and if this language is not available as well, English is used.

Currently only the languages German (de) and English (en) are implemented.

Namespace Filter

The namespace filter is only relevant if one of the two options for automated checking is enabled (see section Modes of Operation). By default the automated checking is only done for articles in the default namespace (i.e. the regular articles). It is possible to specify which namespaces should be checked automatically. In order to use it, add the following line to the JavaScript page of your wiki-skin and add the numbers of the namespaces you want to include:

var deadLinkFinder_namespaceFilter = [0,2];

(in this example, the default namespace (0) and the User namespace (2) are enabled. Change it according to your needs). The defined namespaces can vary for different wiki installations.

Trouble shooting / FAQ

What do the numbers next to the warning icons mean?

If next to the warning icon a three digit number is displayed, then this number is the HTTP-Statuscode, which was returned by the server. If the first character is an X, then it is an error code from either the script or the HeaderProxy. Here are the possible Codes and their meaning:

  • XX1 Could Not Reach Server: This Error happens, if the entire server can not be reached. This could be a temporary problem, e.g. when the server is currently overloaded.
  • XX2 No Link: This error message is shown, if no link has been handed to the HeaderProxy. This error message should never be shown by the script.
  • XX3 Unsupported Protocol: This error message is shown, if the script finds links, whose protocols are not supported by the HeaderProxy. Currently only http:// and https:// links are supported. Other protocols, such as mailto: or irc:// cause this message. However, in the default settings of the script, those messages will not be displayed.
  • XX4 Unknown Error: This error message means that there has been an error in the HeaderProxy or that the other server is taking to long to respond (similar to XX1).
  • XX5 Unknown Script Error: This error message means that there has been an error in the script.

If the error messages XX2, XX4 or XX5 are shown, please report it on the discussion page of this script.

Does this script also works with other languages / other Wikimedia projects?

Yes, this script works with all languages and projects of the Wikimedia Foundation. However, currently I only run and promote it on the German language Wikipedia, to test it. Later on, I will run and promote it on other languages and projects as well.

How do I stay informed about news and updates to the script?

I will post all relevant information about changes to the Dead Link Finder on this page: Benutzer:Frog23/Dead_Link_Finder/Updates. So just add this page to your watch list and you will always be up-to-date.

How can I support the development of this script?

First of all, anybody who regularly uses Wikipedia can help by installing the script, having it run and use it for a while and than give feedback on the discussion page of this article. Please report, how the script is running, what you like and most importantly what does not work properly or what can be improved. This is quite important if the script produces strange and unexpected results. For such error reports, please always report on what page you have been (with permalink to the specific version), at what link did the error occur, which browser (with version number) and which skin you are using.

Also you can tell other Wikipedia users about this script and convince them to use it as well.

For the long term it would be useful to translate this script and the documentation to several other languages, so it can be used and promoted for other languages and projects. However I would like to wait a little longer with that.

If you are well versed with JavaScript and the Internet Explorer, please feel free to take care of the IE compatibility of the script.

If you are well versed with JavaScript, PHP and/or HTML and want to work directly on the script, please contact me directly.

What other features are planned?

I still have a few things in mind, which I will implement at some point in the future. Of those, the most important ones are:

  • the already mentioned page, which shows the latest dead links found, with pagination, filters etc.
  • support of the Internet Explorer


Is the source code available and if so, under which license?

The source code of the script itself is available (just take a look in the file http://tools.wmflabs.org/deadlinkfinder/script.js ) and it is released under the GPL-3. The source code of the HeaderProxy will be released in the future (also under GPL-3), however, I have to make some adjustments beforehand. If you want to use the code on your personal wiki, just contact me via my Discussion Page.

Other

A quite common reason for dead links is, when a new version of a website is launched. Usually the old content is still available, but under a new URL. Usually there are several links on Wikipedia to such a website. To find other articles with links to the same website, the special page Link Search can be used.