More ding-dong on the authority of Wikipedia recently, with much of the debate swirling around Many-2-Many.
Clay Shirky posted something that caught my eye there today, which is to side-step the argument with information design.
He proposes a ‘dashboard’ for each entry, allowing the browser to make his or her own mind up to the veracity of the information by making transparent the contributions and changes to that entry over time.
This, to me, was precisely what Martin Wattenburg was exploring with his History Flow project for IBM, but using visualisations to allow one to assess the ‘shape’ of the entry’s evolution quickly.
Teaming this up with Edward Tufte’s Sparklines concept i.e. visualisations of supplementary information inline to the main text led me to mock-up something that gives the user Clay’s “trust profile per item” married with Martin’s visualisation effect to give a quick idea to the user of the entry’s history.: Historyflow sparklines.
I have used the examples that Wattenburg et al have in their project gallery on the IBM site for speed, but the project screenshots appear to be a couple of years old, so they do not reflect the current state of the wikipedia entries.
First example: Iraq
Iraq’s entry seems to have a ‘hockey-stick’ trajectory, with a marked upsurge in changes as the the conflict there occurred. There is also a discontinuity, where there must have been some controversy, ‘damage’ and repair.
Shrinking this down to “Sparkline” size, e.g. 25px high, flipping it vertically (which seems to make more sense to me, sorry Martin…) and putting it inline to the ‘history’ tab gives you something like the image on the right. [Image missing]
The information seems to hold up even at sparkline-scale, and this was just with me doing a very sloppy job in 5 minutes of photoshop. An auto-generated sparkline with proper data and definition would work well I think.
So, what about a non-controversial example?
In the History Flow gallery, the entry for “Love” seems to be a lot calmer and less choppy:
Here again, a very quick mock-up of the History Flow plot embedded as a sparkline in the tabs of the Wikipedia entry.
Here, it can be quickly seen that this is an entry that has not seen rapid growth and has been relatively stable.
Although this needs a lot more work, I think there is the germ of something here; and would appreciate the thoughts of those who are more knowledgable about data visualisation (paging Martin W, Joshua S., and Tom Carden to the white courtesy phone) on whether this is worthwhile / do-able.
I guess the major flaw in this compared to Clay’s mini-changelog report on each entry, is that its a graphic abstraction that you would have to become familiar with the meaning of before it became useful.
Personally, as with History Flow, I think it is fairly immediate its meaning, and as a dynamic, constantly changing visual ‘bug’ on the otherwise frugal wikipedia page-design it would pique people’s curiosity to find out what it meant and/or click on the history tab.
I don’t really like web UI’s where there is a symbol or button and someone has had to supplement it with a ‘[what does this mean?]’ hyperlink by the side of it – and I don’t believe that historyflow sparklines would need that. It might be enough for the data that Clay is pulling out into his mini changelog to present itself when the user hovers the mouse over the sparkline bug, creating an association between the graph and the data.
Anyway – quickly done, but I think there’s something there.
Beyond Wikipedia, to the increasing production and use of ‘we-media’ there is an information design problem to be solved here, and it would be great to see more people trying to solve it rather than arguing about ‘authority’
0 thoughts on “Wikipedia: Shirky / (Tufte x Wattenburg) = ?”
I reckon you’re onto something. Good on ya! I love “watching” you work.
I’ve sit down with Shirky more than once to discuss thorny design issues. It’s one of the qualities that causes him to shine. He’s very astute at pinpointing problems and bringing them to the design community. He much less concerned about handing down edicts and enforcing notions of what he might think the solution is — that can’t be said of all academics.
Great idea, Matt, probably worth cross-posting to the Ask E.T. forum at http://www.edwardtufte.com to see what the sparkline fans there think.
A couple of thoughts…
The history flow colours give an indication of how much of the article persists over time. If you’re going to drop to black and white, you’re only showing how the length of the article changes.
On a really controversial (or badly misunderstood) topic, the entire content might flip-flop without the length ever changing. So the history flow silhouettes wouldn’t be meaningful there.
That said, a sparkline indication of how much change has ocurred in an article over time would be excellent.
I think a lot of info visualizations will integrate better and thus result in a more seamless way to enhance user interaction if using such sparkles in a way you’ve suggested. Nice one!
Just a remark: Martin’s surname is Wattenberg.
Nice application of sparklines. I started a PHP sparkline project a few months ago that might help jump-start an implementation of this idea against live data. It should be a simple matter to query aggregate wikipedia stats and render / cache a sparkline image.
The library is at http://sparkline.org
Hey James –
Mike Migurski added some sparklines to my installation of reblog, as an experiment. It’s not public yet, but you can see what it looks like here:
Of course, History Flow includes a -lot- more information than just # of changes; it’s basically a visual diff.
Still, sparklines would be darned useful.
All Wikipedia data is here:
You’d need to poll for # of changes over some period of time. It’s a lot of data (86 GB currently). 1.5 GB if you wanted to take the tip, and then reach back for anything that had actually changed. I’m sure lots of that weight is images, which debatably could be excluded from the sparklines.
It’d be a good idea to contact Wikipedia admins on this, since of course they have the data (and horsepower) local.
If they aren’t too interested, then we can still make it work.
Get periodic number of changes to pages
Generate new sparkline images for changed pages.
Host images somewhere… heavy caching, obviously.
Make greasemonkey user script to attach correct image for current page. (Or make site-specific extension, but that seems overkill).
Greasemonkey here: http://greasemonkey.mozdev.com
I agree with what daniel harvey said above about the colors. It’d be interesting to see the embedded sparkline in some combination of color that might indicate how much the article has changed and how much controversy it may entail.
Metacritic-like infographics. No server side PHP-hell (sparklines), but wise CSS.
“… Wikipedia. Considering the nature of the knowledge construction taking place in Wikipedia, one never knows of how much “truth” is the actual page contains, until she/he does not check some crucial information about the given page. [Worth noting the overall good “truth-factor” of the Wikipedia, by Lih, Andrew: Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource, 5th International Symposium on Online Journalism, 2004]
Every Wikipedia page has some metadata-like attributes those are not shown, or not easily recognizable, but stored in the database. These are the following
– was the content of the page discussed ever, by anyone,
– when was the page started,
– how many people contributed to the page,
– how many edits were made to the page,
– were there any major flame or vandalism regarding the content of the page,
– how many other pages link here.
I propose that if every Wikipedia page would have a graphical representation of these data or they relation to each other, that would help the user to have an immediate opinion of how much should she/he should trust that page. (Font size, coloring, shading could be easily done even within HTML using CSS – no need for special graphic generation methods.)
If we continue this idea in a way that considering the user is an active contributor, we should show her/him, how “close” or “far” is a given article to her/him. We should interpret “closeness” based on an Erdos number-like model. Closeness means trustability. And here we also come back to the geographically-sensitive sticker board that really works when you know can trust or mistrust information based on “closeness”.
Trust is handled in quite different ways in collaborative knowledge communities, I personally think that the most interesting experiences came from slashdot (see: Rutigliano, Lou: When the Audience is the Producer: The Art of the Collaborative Weblog – http://journalism.utexas.edu/onlinejournalism/audienceproducer.pdf). Even technically sophisticated users are lazy and all feedback mechanism should be formed according to this easy phenomenon. Giving away a number of “trust points” they are able to give to articles I propose three easy buttons: “I like it” (you give trust), “I don’t like it” (you lower trust) and “Alert” (you see something strange). If you are using either “I like it” or “I don’t like it” too much, the system have to make you argue why do you think so. In case of “Alert” you have to describe why do you think that you do not trust at all what you see.”
Thanks for the links, soobrosa.
I’m not sure I agree about the presentation, but I agree the metrics should feed some sort of feedback mechanism.
So one thing that occurs to me about this is that rather than displaying tooltips or explanations of what the graph means on the tab, that you could instead simply display a larger and better explained version of the it on the page you click through to. It would only take – say – an inch of space (you know what I mean) to have a larger and more carefully annotated version of it. This would have the effect both of explaining it, providing a more information-rich version of it and creating a nice semantic link between what you click on and what you get.
Thinking about it more, I’m now quite interested in what other bits of dynamic data that you might normally view in parallel pages that you might be able to represent in a truncated form in tabs and the like…
honestly I don’t have to time to Photoshop it, but imagine a first graphic impression of trustability like this http://tyrell.hu/~b2men/Clipboard01.jpg
I think this is a really neat idea as well. There might be some social implications though. As you start to quantify information graphically, it can start to become a “game” for some people to spike the graph, especially in a wiki environment.