Wikiup:WikiProjekt Vorlagenauswertung/en
Diese Seite auf Deutsch: Wikipedia:WikiProjekt Vorlagenauswertung
This project (Codename: Templatetiger) extracts all templates from the database dump. It intends to analyse the variable values contained in the templates and represent them in new ways, apply filters and do other useful stuff.
Startpage: https://de.wikiup.org/wiki/Toollabs:templatetiger
Objectives
The now dormant project Wikidata shall be revived, Wikipedia itself shall be prepared for the highly interesting Semantic MediaWiki.
On the one hand, this projects intends to facilitate maintenance work (on categories and templates) and projects on Wikipedia, on the other hand it intends to offer new search capabilities for interested users of Wikipedia. Although such data was extracted previously within the projects Geographical coordinates and Persondata, a similar effort was not afforded to smaller subjects, until now.
Another objective is to demonstrate that template extraction via the parser (realtime data) is sensible and feasible in regards to performance. Moreover, not every template needs necessarily a table for itself.
The project layout intends to support all the mayor Wikipedia languages from the start.
{{Wikipedia:WikiProjekt Vorlagenauswertung/news}}
Use
This project is used by modifying the URL, so it offers only a very limited usability. To mitigate this limitations, we will answer questions to the project on the talk page. As the amount of data is very large, we please ask interested parties for a little patience until results to a query are displayed. Once the query is completed the results can be used very fast, though.
Template selection
To select the available templates, open on one of the following pages:
- template-choice.php|lang=de Templates, most often used ones first.
- template-choice.php|lang=de&az=yes&from=Info template selection in alphabetical order, beginning with Template:Info...
Table display
It is possible to filter the template for a pair of variables by using the URL parameters "Where" and "is" for the desired variable value. For example:
displays all people born in London.
displays all mountains which where first climbed during the 19th century.
The query tries to find a substring by using the SQL command LIKE %...%. As wildcards, the commands "%" are used for a variable number of signs, the command "_" for exactly one discretionary sign. The following query displays all mountains, whose last eruption is noted in the template (volcanoes): https://de.wikiup.org/index.php?title=Toollabs:templatetiger/tt-table4.php&lang=de&template=Infobox%20Berg&where=LETZTE%20ERUPTION&is=_&offset=0&limit=30
Regular expressions
Display of all 8000 m mountains by using regular expressions (please copy internet address into the browser box)
Negating the query
&where...&is=...¬=yes
shows only items, which do not fulfill the query, but they must contain an entry.
Sorting the result
&order=article
sorting the articles alphabetical by article names (Example)&order=columnname
sorting the articles alphabetical by a selectable column. Shows only articles with an entry. (Example Games sorting by producer)
Sorting works only without a filter.
Selection of Columns
With parameter &columns=column1,column2,...
only some columns are displayed, so the result could be faster and more clearly. Example:
https://de.wikiup.org/index.php?title=Toollabs:templatetiger/tt-table4.php&template=Infobox%20See&lang=de&where=&is=&columns=LAGE,MAX-TIEFE
Change of line count
By changing the limit variable in the URL it is possible to display more than the standard amount of articles which is set to 30. On account of security the maximum amount is limited to 2000 at the moment.
Re-use of data in spreadsheets
By copying the tables in new spreadsheets it is possible to hide rows, sort the content or change data fields.
Openoffice Calc
Supports besides Copy/Paste the possiblity to use Insert/Link to external datas... directly to the URL.
MS Excel
The programm can be used with the Extras/Web query.
Disadvantages of this procedure
All data field entries are uniformly recognized as text. This limits the possibilities for e.g. the sorting of numbers. More than one filter criterium is difficult to apply. It appears to be difficult to find articles which lack certain field entries.
Database
Creation
The data inside the database is read while extracting the geographical data (WikiProject Geographical coordinates). The Perl script was extended for this purpose, so that it can also read all text inbetween curly braces ({{…}}). Templates without variables are ignored. From the other templates the variable names and values are extracted.
The current method reads only templates, which are not listed inside another template. If for example in the Template:Infobox in the variable "POSITION=" the Template:Coordinate is used, only the variable value "Template:Coordinate" can be read. Furthermore, the comments inside the templates are left out, because otherwise the data interpretation would have been much more difficult.
Database layout
For each language version there will be an own table. For the German language, this is "pub_tt1_de". Every variable in every template of every article contains one set of data, so for the German WP, there are 1,9 million data entries at the moment. A data set contains:
name | Name of the article |
name_id | ID-number of the article |
tp_name | Name of the template |
entry_name | Name of the template variable (1,2,3,… or name1,name2…) |
value | Value of the template variable |
How to access
Through the Toolserver you can access to u_kolossos_p
database, if you have an account. This way, you can write your own application.
Data download
There are plans to make the data download possible later.
Project participants
People are sought for further optimization of the data analysis, and to propagate the project in the other language versions.
- Snipre 18:29, 8. Jan. 2009 (CET). I’m active on WP:fr (fr:User:Snipre) and I am developping interfaces in order to build the query in an easiest way.
- …
- …
Project coordinators
We are keen to answer any of your questions.
- User:Kolossos Application programming
- w:en:User:Bgwhite Data extraction