This is great – someone has hacked a service that generates a custom RSS feed based on a keyword search of BBC News.
So, for example, if I wanted to track the UK Labour government’s idiotic plans for ID cards, I just type in “ID Cards” and get a RSS feed to put in my news reader of choice.
It also works for the rest of the BBC website, so if I wanted to track any content on the band “Franz Ferdinand”, I just type it in and get a feed, which will return content the BBC have got on the hip scottish artrock combo.
UPDATE: Someone from the BBC has asked me to take this entry down, so as a compromise I have removed the links to the site outlined above.
David who posted from the BBC to ask to remove the reference has replied more fully to the comments below, and raised some good practical challenges to doing RSS and connecting to web-services on huge content sites like the BBC News. Many thanks to him for taking the time to clarify and explain some of the issues.
I’d be grateful if you could remove this entry please. I don’t believe access to the bbc news search engine should be available like this, and it should certainly not be advertised publically. We spoke to paul sissons earlier today and made it be known to him that this should not be advertised publically, thanks. Yours, David Thorpe.
question for David Thorp…why?
David – can you detail why? Why can’t the BBC’s infrastructure take a few hits from this excellent service?
If the objection really is “I don’t believe that access to the BBC News search engine should be available like this” then, frankly, fuck em. The BBC News search engine is paid for by my license fee, and I’ll access it damn well how I please. If, on the other hand, there’s a technical reason why this is A Bad Idea, then Mr Corporate BBC ought to explain it.
Mr Thorp, I suggest that if you’re going to be dealing with the public, you take a few PR lessons.
Understood, Ian. It is your content.
So here goes. Right now there are good tech reasons why keyword RSS access to the BBC News search engine isn’t advisable. The main one is that the BBC News search engine is not load optimised to deal with repeated 24/7 polling of identical queries by Newsreaders.
David et al are on totally the case, since the demand to access the BBC’s content via RSS is clear. But what we must ensure is that RSS access by the few does not compromise HTML access by the many.
I hope this explains David’s concern – ensuring that a site as vital as bbc.co.uk/news is always available is a tough task.
Well, I do apologise for, apparently, my “sanctimonious tone”! The BBC Search engine has been designed to be useful to people in a certain way, and without any limits on the number of searches that can be performed. If a person is using a search engine, they generally use for a specific task, and perform one query after another. An RSS news reader, however, would use the search engine in a different way: for example, you would likely subscribe to a few dozen search terms and the newsreader would refresh these one every ten minutes or so. There are two problems here. On the BBC-wide search and internet search, an external search engine company is used, and the BBC pay per search performed. Costs would spiral out of control if the search engine is used in this way, and you would get a lot less value for your license fee. On the BBC News search, we have limited search engine resources, and it’s often a real balancing act to keep this service quick and reliable. Imagine now that the number of searches performed per hour doubles or triples, which is not at all unlikely. Once again, we need to throw a great deal of resources at the search system and it detracts from other areas of the BBC. I do believe the BBC as a whole are interested in, and may develop, further resources via RSS and other syndication mechanisms. But I believe the BBC would like to be able to design systems in an appropriate and cost-effective manner.
Personally the widespread use of the search engine in this “less than appropriate” manner is not a problem on a small scale, but could become a nightmare on a larger scale, which is why I specifically asked people not to advertise this service widely. I am filled with some level of panic at having to spend days or weeks keeping the news search system alive rather than doing something more useful. I hope that clarifies my position and you understand I am not expressing any kind of corporate opinion here, but a personal one based on not being called into work at 3 o’clock in the morning.
All is not lost.
A similar service could be created with Google News by searching for “keyword source:bbc_news”.
You could then place this search term into a 3rd party google news to rss generator such as – http://www.voidstar.com/gnews2rss.php
Google: jolly good idea; I’ll point my rss search at that. Hurrah. Would you mind url for that being blogged?
Incidentally, Radio 1 are right this minute having an on-air competition where they give a trivia question and people have to solve it using the BBC Search on the Radio 1 site. They’re seeing how many hits they can get (going for 370,000 today, I think they said they got 340,000 last time).
Google isn’t the answer here. They’re liable to get very uppity if you scrape their site (rather than using published APIs), as their request to remove a CPAN module shows:
http://use.perl.org/article.pl?sid=02/03/01/1641208&mode=nested
As yet, there is no API for Google News (or Gmail, for that matter).
Google News has already got uppity about RSS scraping; they’ve shut down a couple of sites hosting scripts which tap the engine and pump out keyword searches. I think only one remains.
Similar scripts have been hacked up for Yahoo! News.
Seems to me you can respect the concerns of the sites while still getting the information you want by setting up a Google news alert (http://www.google.com/newsalerts) and then using an email-to-RSS feed generator like the Bloglines email subscriptions feature, or Mail to RSS (http://www.iupload.com/product/mailbyrss.asp) to convert the email to a personalized RSS feed.
No hacking necessary, no abuse of the search engines, no violations of terms of service.