Benutzer:Duesentrieb/Semantic Wiki Web/Chat Log

aus Wikipedia, der freien Enzyklopädie

Edited log of freenode.net#mediawiki, starting at 9. Aug. 2004, 23:39 UTC:

Talking: Duesentrieb, Head, Elian, Hemanshu, Datura, Sesse, Brion, Cyrius, TimStarling



  • 00:39 -!- Duesentrieb [~Duesentri@p548069B3.dip.t-dialin.net] has joined #mediawiki
  • 00:40 <Duesentrieb> Hi All! Is anyone up to a little talk about kategories, their use, misuse and implementation? In the german WP, we seem to we wrapped in complete chaos with regards to Kategories...
  • 00:41 <Head> hi
  • 00:42 <Head> i think they're all busy fixing bugs to prepare the next final version
  • 00:42 <Hemanshu> Duesentrieb: try discussing in #wikipedia or #de.wikipedia
  • 00:43 <Duesentrieb> Hi Head!
  • 00:43 <Head> hemanshu: the fact is that they need some development work to fulfill their category plans
  • 00:43 <Hemanshu> ok
  • 00:43 <Duesentrieb> Hemanshu: yes: my questions are about implementation...
  • 00:43 <Hemanshu> ok
  • 00:44 <Duesentrieb> The first and most simple thing: We would like to be able to search for cross-sections of kategories. How hard would that be?
  • 00:45 <Duesentrieb> Second: Would it be possible to list all members of a categorie, plus alle members of all sub-categories, recusively?
  • 00:45 <Head> duesentrieb: 1) i guess the database stuff would be quite easy, but creating a useful interface isn't
  • 00:46 <Head> 2) possible with my bot
  • 00:46 <Duesentrieb> I was thinking about a Page offering to or so fields by default, and presentig a "more" button to add more. That should do for a start.
  • 00:47 <Duesentrieb> AS to the all-members-recursive thing: This would need to be dynamic, and it should be possible to combine it with the search. Example:
  • 00:48 <Duesentrieb> If I want to find all english writers, i would cross cat. englisch with cat. writer. Now let Poet be a sub-categorie of Writer - i would like to have all peots included in the search, without having to add *both* categories to every article.
  • 00:49 <Duesentrieb> any thoughts?
  • 00:51 <elian_> hi Duesentrieb :-)
  • 00:51 <Head> Duesentrieb, then you would also might want to have an option to list all NON-english writers
  • 00:51 <Duesentrieb> @elian: tach auch!
  • 00:52 <Duesentrieb> @Head: that would be cool, but not crucial.
  • 00:52 <Head> so you need a checkbox 'NOT' and a checkbox 'include subcategories' and a textfield to enter the category name
  • 00:53 -!- TimStarling [~tstarling@zwinger.wikimedia.org] has joined #mediawiki
  • 00:53 <Duesentrieb> @Head: yes, maybe. But the advanced stuff could be left to advanced people;) We could invent a query-language... but no, that's too much.
  • 00:55 <Duesentrieb> @elian: would you have a look at the two bottom section of this page, when you are in the mood: http://de.wikipedia.org/wiki/Wikipedia_Diskussion:Kategorien
  • 00:56 <elian_> Duesentrieb: I just had a look
  • 00:56 <elian_> but I tend to ignore categories
  • 00:57 <Duesentrieb> @elain: Is that categorical (pun intendet), or could that change if categories where made into something that actually works for people?
  • 00:58 <Duesentrieb> Ok... now for something not so completely different: I have an alternative idea to categories hounting me for a while, something along the lines of RDF. I was thinking about putting a writup into the meta-WP. Could anyone suggest a good place? I don't know myy way around there too well...
  • 00:59 <datura> Duesentrieb: i thought about that for some time myself..
  • 00:59 <elian_> Duesentrieb: actually I'm waiting that people make something reasonable out of it
  • 01:06 <Duesentrieb> @elian: come and help! "People" are trying to think of something "reasonable"... an additional good thinker would really help...
  • 01:08 <elian_> Duesentrieb: I've to limit myself a little bit - can't work on everything
  • 01:08 <Duesentrieb> @datura: what where your ideas?
  • 01:09 <Duesentrieb> @elain: oh, really? i never though of that;) well, join us when you have time...
  • 01:09 <elian_> Duesentrieb: the title is very promising ;-)
  • 01:09 <Duesentrieb> I'm going to put up a Page at http://de.wikipedia.org/wiki/Benutzer:Duesentrieb/Semantic_Wiki_Web about that issue.
  • 01:09 <Duesentrieb> Ther's just a stub there right now, but i'll fill it over the next few days.
  • 01:10 <Duesentrieb> Please come ye all and join in on the discussion-page! But i'll post again when there is some content there...
  • 01:13 <datura> Duesentrieb: the thing is, neither access to WP content nor the content itself is very machinefriendly. both would imho be useful in the future. so the logical next step would be to build a machine-friendly API to the database, and enable people to add easier machine-readable content.
  • 01:14 <datura> categories and catchwords are not bad, but rdf-triples are much more general - so if we find a good syntax, we could embed them easily.
  • 01:14 <Duesentrieb> datura: yes, exactly!
  • 01:15 <datura> der gedanke lag einfach nahe..
  • 01:15 <Duesentrieb> I was thinking like this: don't let people define Categories, let them define Relations!
  • 01:15 <datura> i wonder whether the community really is able to provide good RDF-data..
  • 01:16 <datura> most people think in categories (classes) or prototypes more often than in relations..
  • 01:16 <datura> i tend to think in pointers ;)
  • 01:17 <Duesentrieb> Pointer are not it - think in relations, and relations of relations. Triplets are the Atom of Ontology...
  • 01:18 <Duesentrieb> I was thinking we link Artikles like this: in Artikel Berlin, put [[Ist Teil von>>Deutschalnd]].
  • 01:18 <Duesentrieb> Or, the reverse put the Link [[Ist Teil von<<Berlin]] into the Artikle Deutschland.
  • 01:19 <Duesentrieb> The Type of Relation would be defined by it's use, and would feature a description Page. On that page, you could then define that the Relation is a suptype of some existing relation.
  • 01:19 <Duesentrieb> (am i making sense?)
  • 01:20 <Sesse> I'm a bit unsure how useful those really are
  • 01:21 <Sesse> somehow I like the "reindeer" story in http://www.namesys.com/whitepaper.html
  • 01:21 <Sesse> (yea, yea, it's from the eraserfs-people :-) )
  • 01:23 <Duesentrieb> Sesse: heh cool, bookmarked.
  • 01:23 <Duesentrieb> Sesse: You are right that one can use RDF to produce a complete mess - or a complete lock-in.
  • 01:24 <Duesentrieb> But the beatuty is in the fact that you can *define* more useful relations then "is-a" or "instance of", etc. Those are just the basics. A generalisation of all of these along the lines of "is related to" is even trivial...
  • 01:27 <Duesentrieb> As it is now, the only relation we got is "is somehow related". Humand are often able to deduce the correct relation from context, machines are not. So, as to provide a more useful interface for both, I think it would make more sense to allow more specific relations, like "is component of", "is instance of", or "is president of".
  • 01:27 <Duesentrieb> (must stop rant... is anyone still listening?)
  • 01:27 <Sesse> oh, but "is president of" could also be "is head of state for"
  • 01:27 <Sesse> or "is a politician in"
  • 01:27 <Sesse> or "lives in the white house"
  • 01:27 <Sesse> :-P
  • 01:29 <Duesentrieb> Yes, so? We have the same problem with naming Articles - the solution is usually a redirect. One could define aliasses for Relations. But that's not the issue:
  • 01:30 <Duesentrieb> You don't have to know the name of the relation to find things. In every articles, you are offered links of different types to other Articles, stateing the exact relation.
  • 01:31 <Duesentrieb> That would be helpful for humans and boxes, i think. And providing a free, growing and flexible ontology would be extremely cool!
  • 01:32 <Duesentrieb> Waht's the better nam: "Sematic Wiki Web" or "Wikitology"? Quick, someone grab the domain!
  • 01:34 <Duesentrieb> But i'm ranting on the tech-channel. I will put that down on the page mentioned earlies, in a more structured form. well, actually not yet: the start is going to be a dump of this chat;)
  • 01:34 -!- Cyrius [~t-bone@cpe-24-175-249-240.gt.rr.com] has joined #MediaWiki
  • 01:34 <brion> Cyrius: save yourself! we've got a semantic web loony on the loose! ;)
  • 01:36 <Duesentrieb> Heh, you're right. Actually I'm not keen about RDF itself. But Ontologies as such are cool... and going to be important in the future. But don't worry, i'll stop in a second.
  • 01:36 <Duesentrieb> The question i would really like to ask is:
  • 01:36 <Duesentrieb> How hard would it be to code the triplet-logic into the media-wiki and let users define relations like they define categories now? Completely out of the world, or no so hard at all?
  • 01:36 <brion> probably not insanely hard.
  • 01:37 <brion> it could extend the existing category system perhaps in a freakish way to allow specifying a relationship
  • 01:37 <Duesentrieb> erm, a "freakish way" is not what i wanted to hear;) but you boost my hopes, anyway...
  • 01:38 <Duesentrieb> My thought was: For efficiency, it would be best to map the triplets directly into DB-tables.
  • 01:39 <brion> basically, the category system maps article is-related-to category
  • 01:39 <Duesentrieb> But that would result in either having everything in a single table, or having thousands of tables, and dynamically creating new ones by the minute...
  • 01:39 <Duesentrieb> @brion: yes.
  • 01:39 <brion> what would you need that a table of triplets wouldn't give?
  • 01:40 -!- Guanaco [~Guanaco@ip68-102-29-237.ks.ok.cox.net] has quit ["Chatzilla 0.9.64b [Mozilla rv:1.7/20040707]"]
  • 01:41 <Duesentrieb> Nothing: by the RDF-model, *everything* can be expressed as a single table of triplets. The question is: would that still be efficient? I think it takes about 5-10 triplets per Artikle to make a *good* web.
  • 01:41 <brion> well how do you expect to use them?
  • 01:43 <Duesentrieb> @brion: for a start, i would think to show those kind of like qulified categories:
  • 01:45 <Duesentrieb> Instead of just saing "Related to XYZ", the "special" links would be labeled "Is in XYZ", "Is Component of XYZ", "Is Author of XYZ", etc.
  • 01:45 <Sesse> Duesentrieb: what exactly do you plan to use all this information for, btw?
  • 01:46 <brion> Duesentrieb: sounds ideal for a table of triplets.
  • 01:46 <Head> sesse: the problem is that until now, nearly nobody uses categories to search something up
  • 01:46 <Head> although it's a lot of work to maintain them
  • 01:47 <Duesentrieb> @Sesse: For now, just a working kategory system, where reations could be separated into "Object/Space/Time/Topic" (e.g. Organisation/Germanie/20th Century/Government). That's the major concern right now, especially with the Space-Axis.
  • 01:47 <brion> fulltext keyword search requires nil maintenance and generally does a good job.
  • 01:47 <Sesse> brion: mm
  • 01:48 <brion> the main purpose of categories is to be able to make broad divisions for overviews of many pages in manageable chunks
  • 01:48 <Duesentrieb> The infamous example is that the "Hamburger Aalsuppe" (a disch) is related to, but not a part of Hambur. It should be possible to express that somehow...
  • 01:49 <brion> The question is, what purpose does expressing the relation serve? And is it worth expending a lot of effort to do it?
  • 01:49 <Duesentrieb> @brion: generally, yes. But the categorisation has virtually come to a halt due to the fact that it does not seem to be possible to find a sensible system with the tools we have now.
  • 01:49 <Sesse> Duesentrieb: easy: "Hamburger Aalsuppe is named after the city Hamburg"
  • 01:49 <brion> humans are the target audience, not computers.
  • 01:50 <Sesse> it is then related to, but not a part of Hamburg.
  • 01:50 <brion> what will a computer do with the information that we need to go to great effort to provide it?
  • 01:50 <Head> sesse: the problem is: category Hamburg is in category Place in Germany
  • 01:50 <Sesse> Head: it doesn't need to be in category Hamburg
  • 01:50 <Sesse> the Hamburg link takes care of that
  • 01:51 <Duesentrieb> For instance, one could generate overview-pages by location (or time, or whatever), that would be helpful. The problem is that right now, in a categorie of a country, you find places *in* that country, people *from* that country, disches *named after* that country, etc.
  • 01:51 <Duesentrieb> Sometimes you might want that, and sometimesd you don't.
  • 01:52 <TimStarling> we have overview pages, they're the main articles related to that topic
  • 01:52 <brion> seems like a bit of a non-problem, though. if you only want _places_, make sure that you've got Category:Place on it too.
  • 01:52 <Duesentrieb> @sesse: Yes, the example was that for the stated reasons, the Aalsuppe *must not* be in the categorie for Hamburg. But people don't understand that, so it returns.
  • 01:52 <TimStarling> arranged in nice human-readable format, most important things first
  • 01:54 <Duesentrieb> puh, seems like i have raised some heads... I'm yuite buisy trying to resond to everyone at once...
  • 01:54 <Head> duesentrieb: i think you could solve the problem by renaming category:Hamburg to category:Place in Hamburg
  • 01:55 <Duesentrieb> Just a few more things: It really is an issue: have a look at http://de.wikipedia.org/wiki/Wikipedia:Kategorien, and especially the discussion-page. Even if you don't understand german, the sheer size, structure and history...
  • 01:56 <Head> Duesentrieb, have you seen http://de.wikipedia.org/wiki/Wikipedia:Kategorien/%C3%9Cbersicht ? (600 kB large)
  • 01:56 <brion> Duesentrieb: i think the problem comes from trying to treat categories as rigid and inclusive. a 'real' query comes from combinations of relationships.
  • 01:56 <Duesentrieb> @Head: yeas, thanks for that! I was typing the link as you posted! beat me to it by a few seconds...
  • 01:57 <Duesentrieb> brion: be until now, we could not combinde relationships, because it's not possible to query sections
  • 01:58 <Duesentrieb> we would at least need that... I think we could actually cope with just that functionality. The RDF-Model is just a "If I was King" fantasy...
  • 02:00 <Duesentrieb> Just to answer why i think a machine-readable ontology resulting from the wikipedia would be cool:
  • 02:01 <Duesentrieb> Ontologies are going to be *very* important. More and more real-world processes are being modeled. The people that do that need pre-build ontologies that alow them to perform "reasoning" about the real world.
  • 02:02 <Duesentrieb> Good ontologies are rare and expensive. It's a *hot* field of research. Providing a grass-root ontology, growing from a open-content dictionary seems - obvious.
  • 02:02 <Sesse> I had to go check wikipedia to find out what an ontology was :-P
  • 02:03 <Duesentrieb> @sesse: heh! enlightened?
  • 02:03 <Sesse> Duesentrieb: not really
  • 02:03 <Sesse> I'm a bit too tired to grab it, I think
  • 02:04 <Duesentrieb> In a nutshel, an ontology is a model of (part of) the world, that formaly defines the relationship of all objects to each other.
  • 02:04 <Duesentrieb> Modeling in OOP is a rough ontology - the real thing is much more flexiable.
  • 02:06 <Duesentrieb> Ontologies are a way to formally model knowledge of the world. Like an encyclopedia for KI-Systems, Data-Mining, etc
  • 02:08 <Duesentrieb> I need to go to bed soon. But i'll do the writeup as promissed earlier, and post a link here. maybe we can talk some more tomorrow - maybe on #wikipedia for the non-technical stuff?
  • 02:08 <Duesentrieb> (just thinking - that baby could become my major thesis... maybe...)
  • 02:21 <Duesentrieb> good night all! and thanks for your ear... you can have it back now;)