Benutzer Diskussion:Stefan Kühn/Check Wikipedia/Archiv/2009/September
Ideen zur Laufzeitproblematik
Ich habe im obrigen Abschnitt die Laufzeitproblematik gelesen. Ich habe mir daher einige Gedanken gemacht. Ich hoffe es hilft dir die Laufzeit zu verkürzen, bei gleichem Ergebnis. Ich hoffe auch, das du dich damit nicht angegriffen fühlst und es in dieser öffentlichen Form genehm ist. Ich möchte gerne helfen, da ich Teile der Fehler auch als nützlich ansehe und es die Qualität der Artikel verbessert diese zu beseitigen. Selber schaffe ich es leider nicht, immer den aktuellen Dump zu haben. Leider ist die Zahl der Verbesserungsvorschläge für eine Person auch zu viel. Viel Erfolg. Der Umherirrende 18:56, 1. Sep. 2009 (CEST)
- Was ich noch vergessen habe: Hut ab vor der bisherigen Leistung. Wenn du einen Vorschlag umsetzen möchtest, mache es am besten getrennt von anderen Sachen und vergleiche die Ergebnisse (Ausgabedatei oder so). Nur dann kann man sich sicher sein das alles richtig ist (und merkt einen Laufzeitsunterschied, kann auch auch schlechter werden). Falls du meinst, dass die Vorschläge nichts bringen, okay, du musst sie umsetzen, ich würde es dir nicht übel nehmen. Der Umherirrende 19:19, 1. Sep. 2009 (CEST)
Würde es nicht auch gehen, wenn du pro Projekt unterscheidest, ob du nun den großen (All pages, current versions only.) oder doch nur den kleinen Dump (Articles, templates, image descriptions, and primary meta-pages.) brauchst? Und dem entsprechenden das auswählst. Das würde für en die Laufzeit halbieren (ich nehme an, die haben keinen Sonder-Namensraum) --Der Umherirrende 18:56, 1. Sep. 2009 (CEST)
Wenn du mit foreach
etwas suchst, solltest du die Schleife vorzeitig abbrechen, wenn es gefunden wurde. Nach dieser Seite geht das mit last
(Ich habe keine Ahnung von Perl-Programmierung). Einige ifs in Schleifen kann man dann auch entschlacken. --Der Umherirrende 18:56, 1. Sep. 2009 (CEST)
Ich würde die Namensraumabfragen am Anfang machen, direkt nach dem der Artikel gelesen wurde und nicht innerhalb der Fehler. Wenn der Artikel keinen relevanten Namensraum hat, dann braucht es auch keinerlei Zerlegung des Wikitextes, wird eh alles ungenutzt verworfen. Ein weiterer Vorteil ist, das du für einzelne Projekte den Namensraum leichter kontrollieren kannst. (In der Initalisierungsphase für das aktuelle Projekt die passenden Namensräume in einem Array festlegen, wogegen dann geprüft werden kann. Beispielsweise kann es sein, dass der Namensraum 104 in anderen Projekten aufeinmal nicht interessant ist). Der Umherirrende 18:56, 1. Sep. 2009 (CEST)
- Super. Vielen Dank für die Tipps. Da ich mich selbst als fortgeschrittenen Anfänger bei Perl betrachte, nehme ich gern jeden Tipp entgegen. Derzeit liegt erstmal das Augenmerk auf dem neuen Interface, was ja gut angenommen wird. Da sind auch jetzt schon genügend Fehler gelistet. Aber vielleicht komme ich in den langen Winterabenden mal zu einer wirklichen neuprogrammierung oder massiven umstrukturierung. Meist wächst ja so ein Programm organisch und dann kann das schon mal etwas zeitintensiv sein. ich denke den meisten Performancegewinn kann ich in einigen internen Umstrukturierungen rausholen. Das mit dem Dump hab ich schon beachtet, ich nehme immer nur die Kleinen. Das mit den Namensräumen mach ich schon so, am anfang wird der Namensraum ermittelt, und bei jedem Fehler wird individuell ausgeschlossen. Ich wollte möglichst flexibel bleiben. Das mit dem abrechen der Schleifen mach ich schon da wo möglich. - Das insgesamte Problem ist einfach das Wachstum. Man muss immer bedenken, dass vielleicht heute es noch geht, aber in drei Jahren so nicht mehr möglich ist. Deswegen will ich auch eher weg vom Dump hin zu einer Art Live-Scan, bei der regelmässig in den Wikipedias z.B. die Letzten Änderungen abgegrast werden. Zusätzlich will ich für jeden Artikelscan auch ein Datum abspeichern um nicht dreimal am Tag den gleiche zu scannen. Aber das ist noch zukunftsmusik. -- sk 20:56, 1. Sep. 2009 (CEST)
Error 082 in Finnish wikipedia
All the links starting with [[Wikipedia: (linking to Wikipedia namespace within fi-wiki) are included in the error report. --Jhattara 10:37, 1. Sep. 2009 (CEST)
- IMHO: This is a error. We write a encyclopaedia and not a Wikipedia-project. So in every article should only links to other articles. Only with this permission you can use this data outside of wikipedia. Like in a book or in an other project. -- sk 11:10, 1. Sep. 2009 (CEST)
- Most of the links to the Wikipedia namespace in Finnish Wikipedia are on the pages for years, decades, and centuries, where there is a link to the discussion about how to write time in Finnish Wikipedia. Those clutter the list beyond any usablitity. If the link [[Wikipedia:Keskustelua ajan merkitsemisestä Wikipediassa|ajan merkitseminen]] is included in errors, this error report will remain useless for the Finnish Wikipedia. --Jhattara 09:41, 2. Sep. 2009 (CEST)
- Actually... Just checked that the link to discussion is a redirect. The correct place it should link in Finnish Wikipedia is [[Ohje:Merkitsemiskäytännöt]]. --Jhattara 09:43, 2. Sep. 2009 (CEST)
- I understand the problem, we had the same in dewiki and in other languages. But this link should stand at the discussion page or in a comment inside the article. It should not stand inside the article text. For Example: If I read a article about the year 2001 I will not read how to write this article. - In the next time I will implement a Whitelist inside the new interface. I hope this will help for this problems. -- sk 10:33, 2. Sep. 2009 (CEST)
DEFAULTSORT (006 and 037)
Like the ca.wiki, the esperanto project has another name to the "DEFAULTSORT". We uses DEFAUxLTORDIGO, that creates a special letter ("DEFAŬLTORDIGO"). We have to maintain some special letters also in the sortkey ("Sahxarov" in the sortkey = Saĥarov). These "special letters" are allowed in that project: ĉ, ĝ, ĥ, ĵ, ŝ, ŭ and also Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ (in uppercase). This happens because they are different letters from c, g, h, j, s and u. Could them be ignored by the errors 006 and 037? If you need the unicode, just let me know. Thanks in advance. Castelobranco 02:52, 7. Sep. 2009 (CEST)
- These letters are written with an "-x" ("cx", "gx", "hx", etc.) But the eo-mediawiki - and as I see, the Check Wikipedia dump either - recognizes them as diacritics (ĉ, ĝ, ĥ, etc.). Castelobranco 02:57, 7. Sep. 2009 (CEST)
- Many thanks for this info. I will fix this bug. I write this on my To-do-list -- sk 09:04, 7. Sep. 2009 (CEST)
Error 61 in ptwiki
The list of error 61 - Reference with punctuation (4-sep-09) there are some articles without this error that are shown in the list, like 105 Lélio Gama St. and 12758 (número). Rjclaudio 14:19, 4. Sep. 2009 (CEST)
- I think this is from a old dump. If you want sure, that this is in the article then use this new page. There you found for a bot all articles from the database, where no user set this as "Done". You can set the limit there to 500 and also scroll with the parameter "offset". I hope this will help you. -- sk 09:25, 7. Sep. 2009 (CEST)
Could you change in the script the links at "List of all articles with error xxx" to this new url? Rjclaudio 01:38, 9. Sep. 2009 (CEST)
Sugestion to new errors with Defaultsort
Double Defaultsort, and Text after Defaultsort. Rjclaudio 14:19, 4. Sep. 2009 (CEST) 01:47, 6. Sep. 2009 (CEST)
- Double Defaultsort is a good idea. I write this at the To-do-list. But Text after Defaultsort is not possible. I have no good algorithm to detect this in de, en, es or ja, ar ... -- sk 09:16, 7. Sep. 2009 (CEST)
If you can do this to category why cant use the same algorithm? Maybe you can create a error specific to some languages that you can make this easy. Rjclaudio 01:35, 9. Sep. 2009 (CEST)
Bot-readable updates?
I notice that pages like [1] and [2] haven't been getting updated recently. It would be really nice if those could be updated, in addition to having the new interface, because it's much easier to use a bot when there is a plain-text list of articles to copy. Thanks! -Drilnoth (Talk) 04:54, 9. Sep. 2009 (CEST)
- Problem in frwiki also, all projects seem to have been updated today, but with the scans results made on monday. The new interface is not accessible anymore since yesterday night... -- - Archimëa ✉⇔ 10:24, 9. Sep. 2009 (CEST)
- At the toolserver was a problem with the SQL-Server. This problem was fixed, but now the backup will be implemented. See this mail. - To the problem with the error-lists: This is a bigger problem. At the moment I am happy that the new interface is running very well and the user use this. Also at the moment only the errors from the live wikipedia (and not from a dump) is inside the database. Only new articles and last changes will be scanned and insert into the database. This is also the reason for the low numbers of errors in the new interface. Maybe in dewiki only 14000 or so. In the dump the script find over 100000 errors. In the next days I will create a picture about the processes so that everyone understand the details. The biggest problem is after a dumpscan sometime over 300000 articles must be scanned in the live wikipedia and this is too much. - For all user with bot I have implement a output list in the new interface, also the function to set all articles as done. I hope this help. -- sk 10:51, 9. Sep. 2009 (CEST)
- Wonderful, it works. -- sk 13:48, 10. Sep. 2009 (CEST)
- No, Stephan, it seems that it doesn't work for the french wiki: files are dated "1 Sep 09 09:08", but dump is already dump of monday. 79.87.11.144 11:49, 12. Sep. 2009 (CEST)
- Wonderful, it works. -- sk 13:48, 10. Sep. 2009 (CEST)
WikiCleaner
Hi,
I have started working on Wiki Cleaner to add features in it for fixing the errors detected by your script. Version 0.93 is the first one with this. It's not yet functional and I still have a lot of work to do on it, but the basics are visible.
Main things that needs to be done :
Allow editing and saving the contents of the articlesHighlight detected errors directly in the text of the articlesand propose fixes- Add other errors (currently only errors 48 and 80 are recognized)
- Read complete list of articles on the tool server
If people have comments about this tool, please use my talk page on FR.
--NicoV 14:13, 1. Sep. 2009 (CEST)
- v0.94 is available : the page text is scanned and errors are highlighted directly in the text. Still not functional, since editing and saving are not done. --NicoV 22:16, 1. Sep. 2009 (CEST)
Hi Stefan, I have released v0.95 that allows editing and saving the articles, and also detects other errors (11 types currently). --NicoV 19:45, 4. Sep. 2009 (CEST)
Due to a hosting change, there's a new URL to install WikiCleaner (here). It's better to uninstall the old version before going for the new one (0.97). --NicoV 21:28, 14. Sep. 2009 (CEST)
Hidden auto-redirect
Hello Stephan,
Is it possible to detect "hidden auto-redirect"? I mean by "hidden auto-redirect" a circular redirect by a redirect article. For example in the french interwiki, we have the article fr:Intercalation (mesure du temps) which includes a link (at the end of the article) to fr:Mois intercalaire which return the reader to the first article fr:Intercalation (mesure du temps) because fr:Mois intercalaire which is not an article but only a redirect to fr:Intercalation (mesure du temps).
This error looks like the error 48 with a redirect article.
I hope I have been clear, if no please let me know, Regards,79.87.11.144 12:08, 12. Sep. 2009 (CEST)
- I know what you mean, but the script can't handle this problem. It only check one article and not more. -- sk 21:29, 14. Sep. 2009 (CEST)
False positives in error #64
The Catalan wikipedia has lots of false #64 positives. See, for instance ca:Argolis or ca:Belau. They all seem to be redirects, and the repeated link is reported to be to the original page, but I can't find it. Can you look into it?
Also, I have a request for error #69 that you probably overlooked. We get the same false positive that the Italian wikipedia has reported for #69 in ca:Lector de codi de barres, and we will get it in ca:ISBN if it gets inspected. Can you please white-list them? --Joutbis 19:47, 15. Sep. 2009 (CEST)
- To the problem with #64 see in the redirect Belau. There you find the problem. -- sk 21:42, 21. Sep. 2009 (CEST)
- To the problem with #69, I work one a concept for a white list. I hope this will help in the future. -- sk 21:44, 21. Sep. 2009 (CEST)
New error for fixing references
What i think about lokks like error 081. I can find often reference with the "name=" parameter but the reference is only used one time. So, this parameter is useless.
-- - Archimëa ✉⇔ 12:40, 18. Sep. 2009 (CEST)
- Hmm, I write this at the to-do-list. -- sk 21:49, 21. Sep. 2009 (CEST)
New interface - Graphic bug
Error 031, certainly due to HTML entities... -- - Archimëa ✉⇔ 23:13, 19. Sep. 2009 (CEST)
- I know this problem. The problem is the two ways of display. Wikisyntax and HTML. Maybe I will stop the wikisyntax and only support the webinterface with HTML. -- sk 21:56, 21. Sep. 2009 (CEST)
DEFAULTSORT (037)
Hallo Stefan, in der italienischen Wikipedia gibt es für Biografien die Vorlage Bio. Die hat einen Parameter ForzaOrdinamento
("Sortierreihenfolge"), der wie DEFAULTSORT funktioniert und eine bestimmte Sortierung erzwingt. Beispiel: ForzaOrdinamento = Cajkovskij, Petr Ilic
im Artikel it:Pëtr Il'ič Čajkovskij.
Wäre es vielleicht möglich, diesen Parameter zu berücksichtigen? Im einfachsten Falle könnte man Artikel mit Bio
von der Überprüfung ausschließen. Im Idealfalle würde dein Skript nicht nur nach DEFAULTSORT, sondern auch nach dem Parameter ForzaOrdinamento suchen. Nur so als Anregung :) --MaEr 20:27, 17. Sep. 2009 (CEST)
- Zur Zeit bin ich sehr beschäftigt. Das muss ich erstmal nach hinten schieben. Ich schreib es mal auf die To-Do-Liste. -- sk 21:45, 21. Sep. 2009 (CEST)
New interface
Here is the new interface. The basic functionality is implemented. But I think I can add in the next time many more. If you have ideas for new features then tell this here. In the next time I will implement a whitelist and also a better updating of the data. -- sk
- The links to Japanese Wikipedia and its translation page are broken. Possibily character encoding problem? --fryed-peach 10:48, 1. Sep. 2009 (CEST)
- Yes, I have see this too. Also in other languages (ru, ar). I will fix this. -- sk 11:05, 1. Sep. 2009 (CEST)
- Hi, I made some testing, it appears to be handful. No request, only what i think ;) ... It's clean, and "squarred"... Time of loading page are good. Colors.. (every tastes is in the world ! perhaps it will be twaekable...)... I don't know the way you thought it... an include on each project page will be possible... The done button is awesome ! The possibility to have a big output number ( ← 100 bis 125 → for example, is really useful) -- Cordialement - Archimëa ✉⇔ 1 septembre 2009 à 14:06 (CEST)
- Yes, I have see this too. Also in other languages (ru, ar). I will fix this. -- sk 11:05, 1. Sep. 2009 (CEST)
- Hi. Just to be sure : links like this one will still be available ? It's just for tools being able to read the list of errors. --NicoV 14:23, 1. Sep. 2009 (CEST)
- Maybe in the future I will implement this inside the script. So only the link will be change in the future. But the page will be available. Maybe under "&view=bot" or so. Is this ok for you? -- sk 14:28, 1. Sep. 2009 (CEST)
- Yes "&view=bot" should be ok. The idea is just to have a simple list (minimal formatting to have a simple parsing, ideally only a text file with a title per line) with all articles where a specific error has been detected. --NicoV 15:29, 1. Sep. 2009 (CEST)
- (sorry for the bad english)
- Could add a "done" button to mark done in all articles that has a specific error. When using a (semi-)bot, clicking all 100+ "done" is impossible.
- A sortable table by id/description/article/notice.
- In "High priority/Middle/Low", dont show (or show in a separate list, or hide) itens that dont have any articles with errors.
- Rjclaudio 21:13, 2. Sep. 2009 (CEST)
- Hello Rjclaudio, to 1) this is a good idea, but sometime we had vandales. I will check this, but later. To 2) Yes this will be possible. This is also my next idea. I will try this. First I must fix some basic problems at the database. To 3) Why is this usefull? I think it is ok, but I can also exclude this. -- sk 22:13, 2. Sep. 2009 (CEST)
- Could this "done button" at least delete page errors (25 entry) instead of the whole error list ? -- - Archimëa ✉⇔ 13:29, 3. Sep. 2009 (CEST)
- Navigation problem, Example : When i'm fixing problem in an error list (ex: "Square brackets not correct begin") : if i choose "more" for an article, i go on the "article page error" (i will name it like that), and then it's hard and time wasting to go back to the first error list i come from (in my example "Square brackets not correct begin")... -- - Archimëa ✉⇔ 13:38, 3. Sep. 2009 (CEST)
- Could this "done button" at least delete page errors (25 entry) instead of the whole error list ? -- - Archimëa ✉⇔ 13:29, 3. Sep. 2009 (CEST)
- Hello Rjclaudio, to 1) this is a good idea, but sometime we had vandales. I will check this, but later. To 2) Yes this will be possible. This is also my next idea. I will try this. First I must fix some basic problems at the database. To 3) Why is this usefull? I think it is ok, but I can also exclude this. -- sk 22:13, 2. Sep. 2009 (CEST)
- Hello Archimëa, I hope I have fix this "navigation problem" for you. The line will not delete only the "done" will switch in "ok". So you can go back to the first page. -- sk 08:22, 4. Sep. 2009 (CEST)
I suggest more ways to agroup the errors. Some projects use "BOT" e "AWB" in the name. If this interface could agrupo in the same table all error that a AWB can fix it will help a lot, and will be a good advantage over the old version. And maybe not only AWB/BOT, but the options could be customized in many ways by each project independently. Rjclaudio 13:00, 4. Sep. 2009 (CEST)
- Yes i saw it yesterday night, it's far better and it resolve the problem. Nice -- - Archimëa ✉⇔ 13:21, 4. Sep. 2009 (CEST)
Hi Stefan, a question about the "Done" button. Does it mark the problem as solved (until the next run of Check Wiki ?) so that people fixing errors can work more efficiently ? I am still working on WikiCleaner to provide an interface for fixing the errors (hopefully, a functional version before the end of the week-end), is there a way for my tool to simulate easily the click on the "Done" button ? --NicoV 16:02, 4. Sep. 2009 (CEST)
- @Rjclaudio: If I understand you right, then you want the info AWB/BOT or so for every error number. This is possible, but I need a list from AWB and Bots, maybe for every language? -- sk 16:38, 4. Sep. 2009 (CEST)
- @NicoV: See the Done-Link. You need only to send the this http://toolserver.org/~sk/cgi-bin/checkwiki/checkwiki.cgi?project=dewiki&view=only&id=30&pageid=4534003 if you fix in de the error 30 for page 4534003. The script set make an update in the database ok=0 → ok=1 and not more. With the next scan at the moment all pages with ok=1 will be scanned. -- sk 16:38, 4. Sep. 2009 (CEST)
Just like Wikipedia:Projetos/Check Wikipedia/Tradução, another page to associate error <-> bot/awb/manual/semi-bot/etc. Or using something like "error_091_desc_script=", but a " error_091_clas_script=2" (clas = classification). And just like
######################### # error description ######################### # prio = -1 (unknown) # prio = 0 (deactivated) # prio = 1 (top priority) # prio = 2 (middle priority) # prio = 3 (lowest priority)
do a
######################### # clas description ######################### # clas = 0 (manual) # clas = 1 (awb) # clas = 2 (bot)
but with unlimited clas (or max 10). Each language would use 3 (manual, awb, bot) or 5 (manual, partial awb, awb, partial bot, bot), or 20.
And it could help integrate each project, working together to create rules for bot/awb to fix similar errors. In pt.wiki we have 52 error that use bot/awb, and maybe other languages have rules for the others. This would help find help in other languages.
Rjclaudio 17:28, 4. Sep. 2009 (CEST)
- Maybe showing rules to awb to fix some the errors. In pt.wiki we made it to some errors, but something universal (that each project would adjust, like changing in the rule "Image" for "Imagem") would be better. Rjclaudio 17:32, 4. Sep. 2009 (CEST)
- Ok, I have fix this bug. -- sk 13:51, 6. Sep. 2009 (CEST)
Hello, I love new interface, but I am also begging for button "all done". :) --Ragimiri 13:37, 7. Sep. 2009 (CEST)
- Ok, I will try to implement this. :-) But with many questions like "You are sure?"-- sk 15:22, 7. Sep. 2009 (CEST)
- Ok, I have implement this function. -- sk 22:03, 7. Sep. 2009 (CEST)
- At the moment all description are bad. Because the include Wikisyntax and no html. I will fix this with a translation page. -- sk 22:03, 7. Sep. 2009 (CEST)
- If i'm right errors are sorted in high/medium/low based on the srcipt level and not wiki project level (maybe it will be done with including translation, because level are set there ?).
- Undefined width for table are less usefull with some errors. When the table is larger than screen it's a pitie... to see how it is, see this error -- - Archimëa ✉⇔ 11:12, 10. Sep. 2009 (CEST)
- At the moment all description are bad. Because the include Wikisyntax and no html. I will fix this with a translation page. -- sk 22:03, 7. Sep. 2009 (CEST)
- Yes, at the moment only the script level will be used. I don't see the problem with the table. I think flexible is ok. Please use new headlines, for new requests. I dont like so long discussions. :-) -- sk 13:48, 10. Sep. 2009 (CEST)
Page moved (eo.wiki)
The esperanto project page was moved from eo:Vikipedio:WikiProjekt Check Wikipedia to eo:Projekto:Check Wikipedia, because of the creation of the namespace Projekto. Should I do something to correct the interwikis? The translate page was also moved from eo:Vikipedio:WikiProjekt Check Wikipedia/Translation to eo:Projekto:Check Wikipedia/Tradukado. Thanks. Castelobranco 03:29, 7. Sep. 2009 (CEST)
- Hello Castelobranco, thanks for this info. I will fix this in the script. And then with the next scan at Wednesday all page will have the right interwiki link. -- sk 08:55, 7. Sep. 2009 (CEST)
- Ok, I have change this in the script. -- sk 20:31, 7. Sep. 2009 (CEST)
Error Code 047:
Hello Stefan Kühn,
M.e. sind diese nicht falsch [3]] (Ordinaalgetal) ist kein Template, aber hat etwas mit Mathematik zu tun. Grüss. --Algont 22:44, 7. Sep. 2009 (CEST)
- fixed. dann einfach <nowiki>-tags drumsetzen. (oder auch <math>-tags, wenns passt) --xAwOc 22:58, 7. Sep. 2009 (CEST)
- Besser wäre <math></math>. -- sk 08:36, 8. Sep. 2009 (CEST)
Table max width (new interface)
When the table is larger than screen (my screen is only 19') it's not usefull. All "Done button" are not displayed", you must use the horitonal bar... IF you have 10, 15, 20 times to do this ("done"->then H-bar, "done"->then H-bar, "done->then H-bar, etc...) :-(
To see the rendering of the problem, it may depends on your screen width. Example (hoping it's width enough on your screen), but looking at this, it seems to already have a maximum width, no ?
No way for it to be based on OS screen resolution for example ? (i don't know if it's easy to code... !) -- - Archimëa ✉⇔ 16:31, 10. Sep. 2009 (CEST)
- Hello Archimëa, the problem is most one article with a big nobreakable notice. For example "{{Löschantragstext|tag=4|monat=September|jahr=2009|titel=Fachverband…". If you have done this one then you have a smaller table. -- sk 20:52, 10. Sep. 2009 (CEST)
1 week without update for frwiki
Hallo
It's one week frwiki_output_for_wikipedia.html have not been updated... All users went far away from the project ! Last updated have been made last monday. Can you do something ? -- - Archimëa ✉⇔ 10:20, 14. Sep. 2009 (CEST)
- I will check this tonight. -- sk 17:25, 14. Sep. 2009 (CEST)
- BTW, we have tried to activate a detection on sep 9, perhaps that's the reason for the error ... We de-activate it for next check ... Al1 17:49, 14. Sep. 2009 (CEST)
- I have check my script and don't find an error. Maybe this activation was the problem. I will check this again. Thanks for this tipp. -- sk 21:31, 14. Sep. 2009 (CEST)
- We can hope a (simple) scan for fr this night ? -- - Archimëa ✉⇔ 22:13, 14. Sep. 2009 (CEST)
- A suggestion (maybe a stupid suggestion) : perhaps you should erase or rename the frwiki directory, then create an empty new one ? If it's a disk error due to the crash of last wednesday... Al1 06:57, 15. Sep. 2009 (CEST)
- I have check my script and don't find an error. Maybe this activation was the problem. I will check this again. Thanks for this tipp. -- sk 21:31, 14. Sep. 2009 (CEST)
- Yesterday I had not enough time to check all. The script run currect. You see this at this page, when the last update was. But I don't understand why after the run this file is not updated. I will check this night again. The mystic is that all other languages are ok. I will check this tonight again. -- sk 13:39, 15. Sep. 2009 (CEST)
- I have found the problem. It is inside fr:Orthose. The text "Gordon\'s Mineralogy of Pennsylvania (1922) p. 191" has a "\'". This make a problem in the script. When the script insert something in the database it must mask a ' with \' but here is this mask in the text. The script stop at this point and don't copy the new pages for frwiki. I will fix this. -- sk 20:54, 15. Sep. 2009 (CEST)
- Hello! The same problem is in ruwiki. It's since 02.09.2009 ruwiki_output_for_wikipedia.html have not been updated. Can you also fix this? --SPKirsch 21:51, 15. Sep. 2009 (CEST)
- I think is going to handle this \' problem directly in the script for all language... -- - Archimëa ✉⇔ 21:54, 15. Sep. 2009 (CEST)
- Suchlauf ist bei ruwiki wieder durchgelaufen, ist ja auch im Interface zu sehen. Aber diese ruwiki_output_for_wikipedia.html wird einfach nicht aktualisiert. Was läuft da schief? --SPKirsch 22:49, 17. Sep. 2009 (CEST)
- Hallo SPKirsch, ich arbeite dran, aber ich konnte den Fehler noch nicht ausmerzen. Hoffe dass ich diese Woche mehr Zeit habe, um das Problem zu beheben. -- sk 09:47, 21. Sep. 2009 (CEST)
- So ich hab jetzt was geändert und hoffe, dass es durchläuft. Mal schauen. -- sk 21:40, 21. Sep. 2009 (CEST)
Ok, I have fix the problem. It was a difficult problem. Hard to catch. I will describe the problem: The script use the API. For a faster script I check more the one title at one time. I use the limit of 25 titles because the Url for the API can't be longer then a maximum. This work very well. But when the scrip scan a language with no Latin letters like Cyrillic (ruwiki) the script has a problem. The letter must be transform ("Военно-воздушные силы и войска ПВО Узбекистана" in "%D0%92%D0%BE%D0%B5%D0%BD%D0%BD%D0%BE-%D0%B2%D0%BE%D0%B7%D0%B4%D1%83%D1%88%D0%BD%D1%8B%D0%B5%20%D1%81%D0%B8%D0%BB%D1%8B%20%D0%B8%20%D0%B2%D0%BE%D0%B9%D1%81%D0%BA%D0%B0%20%D0%9F%D0%92%D0%9E%20%D0%A3%D0%B7%D0%B1%D0%B5%D0%BA%D0%B8%D1%81%D1%82%D0%B0%D0%BD%D0%B0") So this API work with this link. Every letter will be transformed in 3 letters. This is in the most case no problem, but sometime if the 25 titles are very very long, then this is a problem. For example see this API-Request. I have fix this problem in the script and hope it work. -- sk 21:09, 3. Okt. 2009 (CEST)
- Oups there is a new problem. I will fix this tomorrow. -- sk 22:40, 3. Okt. 2009 (CEST)
- Ok, now it run! I have updated the page in ruwiki. Now I will start the scan of ukwiki. -- sk 10:59, 4. Okt. 2009 (CEST)
- Danke!!! Thank you!!! Well done.--SPKirsch 14:20, 4. Okt. 2009 (CEST)
Error #003 on italian wikipedia
Hi! In it.wiki template:R is a redirect of {{References}}. Some pages that contain {{r}} are listed as errors. Can you insert that template in your script? Thanks! --Beta16 15:30, 21. Sep. 2009 (CEST)
- I have insert this. -- sk 22:04, 21. Sep. 2009 (CEST)
- Very fast. Thanks! :) --Beta16 10:42, 22. Sep. 2009 (CEST)
commons
on commons user:Rocket000 deleted the sentence „There has to be a space in between "br" and the slash.“ there is no translation page on commons . on the new interface the page commons:Wikipedia:WikiProject Check Wikipedia/Translation is noted as translation page which is a interwikilink to en:WikiProject Check Wikipedia/Translation, which isn't existing. it should be commons:Commons:WikiProject Check Wikipedia/Translation, but that isn't existing too. --xAwOc 11:35, 25. Sep. 2009 (CEST)
- At the moment I have a problem with commons. I hope I can fix this at the weekend. -- sk 17:03, 25. Sep. 2009 (CEST)
- Ok, I have insert commons at the new interface and update the translation page. -- sk 11:57, 4. Okt. 2009 (CEST)
error 003 on hewiki
this error (article has a <ref> and not a <references />) is identified 2210 times because we often use a template instead of <references />. The template is "הערות שוליים" and it can appear as (read from right to left):
{{הערות שוליים}}
-or-
{{הערות שוליים|anything here}}
can you please fix the script for this ? thanks, Mikimik 22:40, 26. Sep. 2009 (CEST)
- also we use a template - "הערה" - instead of <ref>. thanks again, Mikimik 23:33, 27. Sep. 2009 (CEST)
- Ok, I have insert this two references. -- sk 14:28, 4. Okt. 2009 (CEST)
Dump Scan
Can we have a dumpscan for frwiki ? (a new dump was finished 2 weeks ago)
All errors usually fixed are always around 0 to 10 per scan (75% of the 91 errors). All others errors have thousands and were more or less woking on it... -- - Archimëa ✉⇔ 17:34, 30. Sep. 2009 (CEST)
- We have some errors not detected : Take a look at Liste d'articles non-détectés, perahps this articles will be detected with the new dump ? -- - Archimëa ✉⇔ 16:56, 3. Okt. 2009 (CEST)
New suggestion: mixed cyrillic/latin letters in a word
Hi Stefan,
Could you extend your script in order to search words that contains cyrillic chars as well as latins? For example sometimes latin A (U+0041) is accidentally replaced by cyrillic А (U+0410). Probably a simple regular expression could recognize this case. Something like this:
$text =~ /[\x{0400}-\x{04f9}][A-Za-z\x{00c0}-\x{00ff}\x{0100}-\x{0233}]/;
$text =~ /[A-Za-z\x{00c0}-\x{00ff}\x{0100}-\x{0233}][\x{0400}-\x{04f9}]/;
-- Bitman 07:04, 30. Sep. 2009 (CEST)
- I have never seen a regexp like this. Work this? Very interessting. At the moment I work at the translation of the new interface. After this I will try your idea. -- sk 21:18, 5. Okt. 2009 (CEST)
It is like /[a-z][A-Z]/
but uses Unicode chars. This is a code snippet of my bot that repairs error 16: hu:User:GumiBot/code16. Yes, it works. :-) --Bitman 15:36, 9. Okt. 2009 (CEST)