Benutzer:MatthiasDD/temp

aus Wikipedia, der freien Enzyklopädie

Improved Parsers

In December 2015 I (MatthiasDD) started to solve task T29745, References in column affect sorting and T46818 jquery.tablesorter should sort plain year digits as date. Later I take care some other Tasks that be connected with the right detection of parser.

Now I suggest to change the parser detection of sortable tables for all types as follows:

  1. References (class='reference') removed from sort value.
  2. Text before sort value is allowed.
  3. Text after sort value is allowed.
  4. A plain number 1-4 digit can detected as number, date, or isoDate depend from other collumn content.

So "about 1870[1]" would detected as year without use of Templates.

Some aditional improvements in parser detection are described below for each parser.

Test script

For testing without change the actual tablesorter.js you can add in your Custom JavaScript:

mw.loader.load("//de.wikipedia.org/w/index.php?title=Benutzer:MatthiasDD/ts_test.js&action=raw&ctype=text/javascript");

After page loading are used the current parsers. Right-click at the table header activate the new suggested parsers that described here. It change the table also in my diagnosis mode: Show all sort values in that row in a message box, colors the background depend from detected parser for this cell and write title tags for each cell with detected parser (in brackets) and used parser (second line).

Details for each parser

IPAddress

add support for IP/CIDR format, (solve phab:T36475)

IPAdress 1 IPAdress 2
45.238.27.109/32 111.255.333.444
45.238.27.109/8[2][3] 1.202.203.204
45.238.27.109 [4] 1.022.033.044
usual 204.1.132.158/24[Ref 1] 1.2.3.4
a: 204.38.0.0/24 1.202.203.4

currency

Because Text before and after sort value is allowed. it's now done by parser #number. All other currency chars or text are possible. This parser can be deleted.

url

Has not worked since 2011-04-14 because the RegExp was /^(https?|ftp|file):\/\/$/. The $ means the input must end with ://, this is never the end of a url. I have changed this, but i would say: This parser can be deleted. See T47161 Kill all non-trivial parsers in $.tablesorter

isoDate

  • Time without Z was parsed as local time, that was false and is now UTC time.
  • Years 70...99 solved.
  • short forms are possible: JJJJ, JJJJ-MM, and only with data-sort-type="isoDate": JJJJMM, JJJMMTT
isoDate 1 isoDate 2
data-sort-type="isoDate"
isoDate 3
71-01 [5] 197001 0007-07
1970-01-23T03:20Z[6] 19700123T0320Z -8-08-08
1970-01-23T03:20+05:00[7] 19700123T0320-0500 +9999-12
1970[8] 1970 -9999
1970-01-23T03:20:00,111[9] 19700123T03:20:00.111 +60-06T10:00:00-02

usLongDate

Need we this parser really? In my opinion this parser should removed later.

date

  • RegExp inside '[]' (Ecma ClassAtomNoDash): only SourceCharacters \ or ] or - must be escaped.
  • A non breaking space is allowed as (single or aditional) dateSeparator (\xa0)
  • For written Month name (m) folowing forms are possible: dm dmy m md my mdy

Olny year (1-4 digits) can detected as date. At detection of Parser for column this is put away as empty cell, if then 5 other cells found with date or data-sort-type="date" is written in table header, parser date is used.

date 1 date 2 date 3 Month and day date 5
2000 1. 1.00 1. Jan. 2000 1. Jan. Januar, 1 2000
2015[10][11] 10.1.2000[12] 10. Jan. 2000[13] 10. Jan.[14] Jan. 10 2000
about 2010 [15] about 2. 1. 2000 [16] Jan. 2000 [17] Jan. 2. [18] 01 22 2000
ca. 2020[Ref 2] ~ 2. 1. 2000[Ref 3] ca. Jan. 2000[Ref 4] ca. Jan.10.[19] 5.12.1990
ca.2030[20][21] ~ 2000[22] ca.2000[23] Jan December 12 '10

time

Extended to format hhhh:mm:ss[.,]ssss ("932:20" was sorted in Firefox40 as 947968500000, in InternetExplorer8 as 0) see TOI (Time On Ice)

time 1 time 2 time 3
9:59:59 9:59:59,999 00:43 Uhr
ca. 8:00 pm[24] 9:59:61 00:26 Uhr
~ 20:00:00,001 [25] 9:59 932:20
8760:00[Ref 5] 0:00:00
10:00:00.5 am 10:00:00.01

number

  • digitTransformTable is used for standard and scientific format.
  • Text before number can have every character except -,+,−,digit (solve phab:T65055)
  • Numbers can contain spaces &#x20     and '
  • Add infinity as [+-−]∞
  • Scientific notation is possible, a number must stay before [·×⋅]10^? see fa:نماد_علمی Persian: Scientific notation
  • Empty cells will be sort at end, with Number.MAX_VALUE (only ∞ is larger)
  • Cells with text will be sort after number 10000 (solve phab:T123364)
all == 1 number 2 number 3 scientific 4 number 5
data-sort-type="number"
text
1. place ~ 20 € 3 0 0 0'0 5.1 · 101 9999 abc[26]
100e-2[27] ~ 20.5€ -5e-323 5.0 × 101 4 ab[28][29]
1 apple $3[30] ca. ∞[31] 3.9 × 10 -3 A ab[32] c
about 1 about 1 m −∞[33] ۳,۹ × ۱۰−۳ ##[34] e09
10 ⋅ 10-1 1.1¢ 1.79e308 # e-09

Problems with Templates

Some Templates set a invisible sortkey bevore the cell content. Actual parsers sort alpanumeric, but the new proposed parsers detect number or other in this sortkey. The Follow table show the invisible text in brown color and should be extended as needed:

from in the past text now to do
en:Help:Sorting e09 1.01 more than 1e9 date write numbers: 1.01 e09 more than 1e9
en:Template:dts 000000002015-01-01-0000January 1 2015 number: 2015 output isoDate: 2015-01-01T0000January 1 2015
en:Template:dts -999999999911-01-01-000089 BC number: -999999999911 output isoDate: -89-01-01T000089 BC

Good examples, solved Templates

from Output now Remark
en:Template:ntsh 7004123456000000000♠1.23456×10^4 number: 7004123456000000000 The ♠ separate the sortkey and number correct!


References
  1. This references only demonstrate how sorting work
  2. 2
  3. 3
  4. 4
  5. 1
  6. 2
  7. 3
  8. 4
  9. 5
  10. 1
  11. 1
  12. 1
  13. 1
  14. 1
  15. 1
  16. 1
  17. 1
  18. 1
  19. R1
  20. 1
  21. 4
  22. 1
  23. 1
  24. 1
  25. 2
  26. 4
  27. 10
  28. 12
  29. 13
  30. 1
  31. 1
  32. x
  33. 4
  34. 1
  1. Remark
  2. 1
  3. 1
  4. 1
  5. 1