This code is a modified version of Sylvain Schmitz's PHP RSS feed. His code and my changes are both released under the GFDL.
I wanted an RSS feed of my watchlist almost immediately after joining Wikipedia. Unfortunately, the Wikipedia software doesn't support that feature. It seems that many wikipedians have been hacking away at the problem, but none of the solutions on that page worked for me. Sylvain Schmitz's was close, but I didn't want to have to post it to a (public) webserver (and I don't speak french). So, I started hacking away at Sylvain's code to see if I could make it do what I want. This is the result; may you find it useful.
If you have the time, ability, and inclination to add any of these (or an idea of your own), please do. It's open source, after all.
#!/usr/bin/php
<?php
/* Todo
-Determine if anonymous edits could be supported without $entries; this is
currently the array's only purpose
-Make configurable
-Look into getting timezone w/applescript
-Look into setting username/pass w/NNW params
-NNW appears to futz the params. D'oh!
*/
/*******************************************************************************
watchlistrss - a script that produces a feed of your watchlist.
Based on Silvain Schmitz's script: http://meta.wikimedia.org/wiki/User:Sylvain_Schmitz/Watchlist_RSS_feed_in_PHP
Modified by Ryan Ballantyne (Ryos)
My changes from the original include:
-Make runnable from the PHP command-line SAPI
-Localize to English Wikipedia
-Change the RSS feed information to a format I find more useful
-Change output method to play nice with NetNewsWire
What it's for:
This script reads your watchlist from wikipedia and transforms it into an
RSS feed that can be read by a newsreader that has the ability to subscribe
to scripts on the local machine. The only reader I know of with this ability
is NetNewsWire on the Mac.
How to use it:
1) Copy this code to a text file and name it with a .php extension.
When saving, keep in mind that the script saves a cookie file in the same
directory.
2) Configure the script. Set the following variables below:
$wp_name = 'yourname';
$wp_password = 'yourpass';
$wp_tmz = 'your timezone offset from GMT'; //ex: -06:00 (that's my zone)
$script_name = "the name you gave the script.php"; //This is used to name the cookie
3) Make the script executable. To do this, open the terminal and type:
chmod +x
Then, drag the script file to the terminal window and press return.
4) Subscribe to the script in NetNewsWire. Make sure to change the script type
from "Applescript" to "Shell Script".
5) You are now one hoopy frood. Enjoy.
Known bugs/issues:
-Linking to subpages is broken due to the / character being urlencoded to %2F
*******************************************************************************/
/****************************************************************** Setup. */
$printDebug = false;
// time zone on the server; default to GMT
/*$wp_tmz = "+00:00";
// Parse the options
$script_name = $argv[0];
for ($i = 1; $i < count ($argv); $i++) {
switch ($argv[$i]) {
case "-u":
case "--user":
$wp_name = $argv[$i+1];
$i++;
break;
case "-p":
case "--pass":
$wp_password = $argv[$i+1];
$i++;
break;
case "-t":
case "--timezone":
$wp_tmz = $argv[$i+1];
$i++;
break;
case "-d":
$printDebug = true;
break;
}
}
if (empty ($wp_name) || empty ($wp_password)) {
exit ("\nUsage: [-u|--user] username [-p|--pass] password\n\n");
}*/
$wp_name = 'yourusername';
$wp_password = 'yourpassword';
$wp_tmz = "+00:00";
$script_name = 'watchlistrss.php';
//Set error reporting based on if we're debugging
if (!$printDebug) { ini_set ('display_errors', '0'); }
else { ini_set ('display_errors', '1'); }
// default domain and path
$wp_domain = 'en.wikipedia.org';
$wp_watchlist = '/wiki/Special:Watchlist';
// maximum number of entries in the feed
$max_entries = 20;
// localized array for month names
$months = array ("January" => "01", "February" => "02", "March" => "03",
"April" => "04", "May" => "05", "June" => "06",
"July" => "07", "August" => "08", "September" => "09",
"October" => "10", "November" => "11", "December" => "12");
// localized user pages prefix
$wp_userpage = "User:";
// localized title
$wp_title = "Watchlist";
// localized description
$wp_description = "$wp_name's $wp_title";
/*********************************************************** End of setup. */
// name of the cookie file
$cookie_file = $script_name .'_'. $wp_domain .'_cookie';
// get the expiration time from the cookie
$time = 0;
$cookie_fp = fopen ($cookie_file, "r");
if ($cookie_fp)
{
while (!feof ($cookie_fp))
{
$cookie = fgets ($cookie_fp, 4096);
if (strpos ($cookie, "wikiUserID") !== FALSE)
{
$ce = explode ("\t", $cookie);
$time = $ce4];
break;
}
}
fclose ($cookie_fp);
}
// check whether a new login is needed
if (($time - 60) < time ())
{
// login URL
$wp_login = '/?title=Special:Userlogin'
.'&action=submitlogin&type=login';
// login connection
$login = curl_init ();
$postdata = array ();
$postdata'wpName' = $wp_name;
$postdata'wpPassword' = $wp_password;
$postdata'wpRemember' = '1';
$postdata'wpLoginattempt' = 'true';
$post = null;
foreach ($postdata as $key=>$value)
if ($key && $value)
$post .= $key."=".urlencode($value)."&";
curl_setopt ($login, CURLOPT_MUTE, TRUE);
curl_setopt ($login, CURLOPT_POST, TRUE);
curl_setopt ($login, CURLOPT_POSTFIELDS, $post);
curl_setopt ($login, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt ($login, CURLOPT_URL, $wp_domain.$wp_login);
curl_exec ($login);
curl_close ($login);
}
// grab the contents
$content = curl_init ();
curl_setopt ($content, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($content, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt ($content, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt ($content, CURLOPT_URL, $wp_domain.$wp_watchlist);
$watchlist = curl_exec ($content);
curl_close ($content);
// function for ISO8601 time and date
function to_iso8601 ($date_str)
{
global $months;
$date_fields = explode (" ", $date_str);
$day = $date_fields0];
if (strlen ($day) == 1)
$day = "0".$day;
$month = $date_fields1];
$year = $date_fields2];
return $year."-".$months$month."-".$day."T";
}
// explode the contents by days
define ('LENGTH_TIMESTR', 5);
define ('ANON_TITLETEXT', 'Special:Contributions');
$days = explode ("<h4>", $watchlist);
$links = array();
$titles = array();
$descriptions = array();
$entries = array();
$times = array();
$authors = array();
$nentries = 0;
for ($i = 1; $i < sizeof ($days) && $nentries < $max_entries; $i++)
{
$the_date = to_iso8601 (substr ($days$i], 0,
strpos ($days$i], "</h4>")));
$lines = explode ("<br />", $days$i]);
$tmp = explode (" . . ", $days$i]);
//debug
if ($printDebug) {
echo "\$lines $i:";
echo "\n"; print_r ($lines); echo "\n\n";
echo "tmp $i:";
echo "\n"; print_r ($tmp); echo "\n\n";
}
for ($j = 0; $j < sizeof ($tmp)-1 && $nentries < $max_entries; $j++)
{
//links
$offset = strpos ($lines$j], '<a href="') + 15;
$links$nentries = substr ($lines$j], $offset,
strpos (substr ($lines$j], $offset), '"'));
//descriptions
$offset = strpos ($lines$j], '<tt>');
$descriptions$nentries = substr ($lines$j], $offset);
//entries
$offset = strpos ($tmp$j+1], ' title="') + 8;
$entries$nentries = substr ($tmp$j+1], $offset,
strpos (substr ($tmp$j+1], $offset), '"'));
//times
$offset = strpos ($tmp$j], '; ') + 2;
$times$nentries = $the_date.substr ($tmp$j], $offset, LENGTH_TIMESTR).$wp_tmz;
//authors
//Anonymous edits result in different output; we must treat it specially
if ($entries$nentries != ANON_TITLETEXT) {
$offset = strpos ($tmp$j+1], ' title="'.$wp_userpage)
+ 8 + strlen ($wp_userpage);
$authors$nentries = substr ($tmp$j+1], $offset,
strpos (substr ($tmp$j+1], $offset), '"'));
}
else {
$offset = strpos ($tmp$j+1], ' title="'.ANON_TITLETEXT)
+ 8 + strlen (ANON_TITLETEXT) + 2;
$authors$nentries = substr ($tmp$j+1], $offset,
strpos (substr ($tmp$j+1], $offset), '<'));
}
//titles
$offset = strpos ($lines$j], ' title="') + 8;
$titles$nentries = substr ($lines$j], $offset,
strpos (substr ($lines$j], $offset), '"'));
$titles$nentries .= ' . . '. $authors$nentries];
$nentries++;
}
}
//debug
if ($printDebug) {
echo "links:\n"; print_r ($links);
echo "titles:\n"; print_r ($titles);
echo "descriptions:\n"; print_r ($descriptions);
echo "entries:\n"; print_r ($entries);
echo "times:\n"; print_r ($times);
echo "authors:\n"; print_r ($authors);
}
/********************************************************* RSS generation. */
$disallowed_xml = array ("&", "<", ">");
$replacements_xml = array ("&", "<", ">");
$output = '';
// header
$output .= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
$output .= "<!DOCTYPE rdf:RDF [\n";
$output .= "<!ENTITY % HTMLlat1 PUBLIC\n";
$output .= " \"-//W3C//ENTITIES Latin 1 for XHTML//EN\"\n";
$output .= " \"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent\">\n";
$output .= "]>\n";
$output .= "<rdf:RDF\n";
$output .= " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" \n";
$output .= " xmlns:sy=\"http://purl.org/rss/1.0/modules/syndication/\"\n";
$output .= " xmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n";
//$output .= " xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n";
$output .= " xmlns=\"http://purl.org/rss/1.0/\"\n";
$output .= ">\n";
// channel summary
$output .= " <channel rdf:about=\"http://"
.$wp_domain.str_replace ($disallowed_xml,
$replacements_xml,
$wp_watchlist)."\">\n";
$output .= " <title>$wp_title</title>\n";
$output .= " <link>http://"
.$wp_domain.str_replace ($disallowed_xml,
$replacements_xml,
$wp_watchlist)."</link>\n";
$output .= " <description>$wp_description</description>\n";
$output .= " <dc:source>http://"
.$wp_domain.str_replace ($disallowed_xml,
$replacements_xml,
$wp_watchlist)."</dc:source>\n";
$output .= " <dc:date>".date("Y-m-d\TH:iO")."</dc:date>\n";
$output .= " <sy:updatePeriod>hourly</sy:updatePeriod>\n";
$output .= " <sy:updateFrequency>4</sy:updateFrequency>\n";
$output .= " <sy:updateBase>1970-01-01T00:00+00:00</sy:updateBase>\n";
$output .= " <items>\n";
$output .= " <rdf:Seq>\n";
for ($i = 0; $i < $nentries; $i++)
{
$output .= " <rdf:li resource=\"http://$wp_domain/wiki/"
.urlencode(str_replace (" ", "_", $links$i]))."\" />\n";
}
$output .= " </rdf:Seq>\n";
$output .= " </items>\n";
$output .= "\n";
$output .= " </channel>\n";
// items
for ($i = 0; $i < $nentries; $i++)
{
$output .= " <item rdf:about=\"http://$wp_domain/wiki/"
.urlencode(str_replace (" ", "_", $links$i]))."\">\n";
$output .= " <title>".$titles$i."</title>\n";
$output .= " <description>{$descriptions$i}</description>\n";
$output .= " <dc:creator>".$authors$i."</dc:creator>\n";
$output .= " <dc:date>".$times$i."</dc:date>\n";
$output .= " </item>\n\n";
}
// footer
$output .= "</rdf:RDF>\n";
exit ($output);
?>
This code is a modified version of Sylvain Schmitz's PHP RSS feed. His code and my changes are both released under the GFDL.
I wanted an RSS feed of my watchlist almost immediately after joining Wikipedia. Unfortunately, the Wikipedia software doesn't support that feature. It seems that many wikipedians have been hacking away at the problem, but none of the solutions on that page worked for me. Sylvain Schmitz's was close, but I didn't want to have to post it to a (public) webserver (and I don't speak french). So, I started hacking away at Sylvain's code to see if I could make it do what I want. This is the result; may you find it useful.
If you have the time, ability, and inclination to add any of these (or an idea of your own), please do. It's open source, after all.
#!/usr/bin/php
<?php
/* Todo
-Determine if anonymous edits could be supported without $entries; this is
currently the array's only purpose
-Make configurable
-Look into getting timezone w/applescript
-Look into setting username/pass w/NNW params
-NNW appears to futz the params. D'oh!
*/
/*******************************************************************************
watchlistrss - a script that produces a feed of your watchlist.
Based on Silvain Schmitz's script: http://meta.wikimedia.org/wiki/User:Sylvain_Schmitz/Watchlist_RSS_feed_in_PHP
Modified by Ryan Ballantyne (Ryos)
My changes from the original include:
-Make runnable from the PHP command-line SAPI
-Localize to English Wikipedia
-Change the RSS feed information to a format I find more useful
-Change output method to play nice with NetNewsWire
What it's for:
This script reads your watchlist from wikipedia and transforms it into an
RSS feed that can be read by a newsreader that has the ability to subscribe
to scripts on the local machine. The only reader I know of with this ability
is NetNewsWire on the Mac.
How to use it:
1) Copy this code to a text file and name it with a .php extension.
When saving, keep in mind that the script saves a cookie file in the same
directory.
2) Configure the script. Set the following variables below:
$wp_name = 'yourname';
$wp_password = 'yourpass';
$wp_tmz = 'your timezone offset from GMT'; //ex: -06:00 (that's my zone)
$script_name = "the name you gave the script.php"; //This is used to name the cookie
3) Make the script executable. To do this, open the terminal and type:
chmod +x
Then, drag the script file to the terminal window and press return.
4) Subscribe to the script in NetNewsWire. Make sure to change the script type
from "Applescript" to "Shell Script".
5) You are now one hoopy frood. Enjoy.
Known bugs/issues:
-Linking to subpages is broken due to the / character being urlencoded to %2F
*******************************************************************************/
/****************************************************************** Setup. */
$printDebug = false;
// time zone on the server; default to GMT
/*$wp_tmz = "+00:00";
// Parse the options
$script_name = $argv[0];
for ($i = 1; $i < count ($argv); $i++) {
switch ($argv[$i]) {
case "-u":
case "--user":
$wp_name = $argv[$i+1];
$i++;
break;
case "-p":
case "--pass":
$wp_password = $argv[$i+1];
$i++;
break;
case "-t":
case "--timezone":
$wp_tmz = $argv[$i+1];
$i++;
break;
case "-d":
$printDebug = true;
break;
}
}
if (empty ($wp_name) || empty ($wp_password)) {
exit ("\nUsage: [-u|--user] username [-p|--pass] password\n\n");
}*/
$wp_name = 'yourusername';
$wp_password = 'yourpassword';
$wp_tmz = "+00:00";
$script_name = 'watchlistrss.php';
//Set error reporting based on if we're debugging
if (!$printDebug) { ini_set ('display_errors', '0'); }
else { ini_set ('display_errors', '1'); }
// default domain and path
$wp_domain = 'en.wikipedia.org';
$wp_watchlist = '/wiki/Special:Watchlist';
// maximum number of entries in the feed
$max_entries = 20;
// localized array for month names
$months = array ("January" => "01", "February" => "02", "March" => "03",
"April" => "04", "May" => "05", "June" => "06",
"July" => "07", "August" => "08", "September" => "09",
"October" => "10", "November" => "11", "December" => "12");
// localized user pages prefix
$wp_userpage = "User:";
// localized title
$wp_title = "Watchlist";
// localized description
$wp_description = "$wp_name's $wp_title";
/*********************************************************** End of setup. */
// name of the cookie file
$cookie_file = $script_name .'_'. $wp_domain .'_cookie';
// get the expiration time from the cookie
$time = 0;
$cookie_fp = fopen ($cookie_file, "r");
if ($cookie_fp)
{
while (!feof ($cookie_fp))
{
$cookie = fgets ($cookie_fp, 4096);
if (strpos ($cookie, "wikiUserID") !== FALSE)
{
$ce = explode ("\t", $cookie);
$time = $ce4];
break;
}
}
fclose ($cookie_fp);
}
// check whether a new login is needed
if (($time - 60) < time ())
{
// login URL
$wp_login = '/?title=Special:Userlogin'
.'&action=submitlogin&type=login';
// login connection
$login = curl_init ();
$postdata = array ();
$postdata'wpName' = $wp_name;
$postdata'wpPassword' = $wp_password;
$postdata'wpRemember' = '1';
$postdata'wpLoginattempt' = 'true';
$post = null;
foreach ($postdata as $key=>$value)
if ($key && $value)
$post .= $key."=".urlencode($value)."&";
curl_setopt ($login, CURLOPT_MUTE, TRUE);
curl_setopt ($login, CURLOPT_POST, TRUE);
curl_setopt ($login, CURLOPT_POSTFIELDS, $post);
curl_setopt ($login, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt ($login, CURLOPT_URL, $wp_domain.$wp_login);
curl_exec ($login);
curl_close ($login);
}
// grab the contents
$content = curl_init ();
curl_setopt ($content, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($content, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt ($content, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt ($content, CURLOPT_URL, $wp_domain.$wp_watchlist);
$watchlist = curl_exec ($content);
curl_close ($content);
// function for ISO8601 time and date
function to_iso8601 ($date_str)
{
global $months;
$date_fields = explode (" ", $date_str);
$day = $date_fields0];
if (strlen ($day) == 1)
$day = "0".$day;
$month = $date_fields1];
$year = $date_fields2];
return $year."-".$months$month."-".$day."T";
}
// explode the contents by days
define ('LENGTH_TIMESTR', 5);
define ('ANON_TITLETEXT', 'Special:Contributions');
$days = explode ("<h4>", $watchlist);
$links = array();
$titles = array();
$descriptions = array();
$entries = array();
$times = array();
$authors = array();
$nentries = 0;
for ($i = 1; $i < sizeof ($days) && $nentries < $max_entries; $i++)
{
$the_date = to_iso8601 (substr ($days$i], 0,
strpos ($days$i], "</h4>")));
$lines = explode ("<br />", $days$i]);
$tmp = explode (" . . ", $days$i]);
//debug
if ($printDebug) {
echo "\$lines $i:";
echo "\n"; print_r ($lines); echo "\n\n";
echo "tmp $i:";
echo "\n"; print_r ($tmp); echo "\n\n";
}
for ($j = 0; $j < sizeof ($tmp)-1 && $nentries < $max_entries; $j++)
{
//links
$offset = strpos ($lines$j], '<a href="') + 15;
$links$nentries = substr ($lines$j], $offset,
strpos (substr ($lines$j], $offset), '"'));
//descriptions
$offset = strpos ($lines$j], '<tt>');
$descriptions$nentries = substr ($lines$j], $offset);
//entries
$offset = strpos ($tmp$j+1], ' title="') + 8;
$entries$nentries = substr ($tmp$j+1], $offset,
strpos (substr ($tmp$j+1], $offset), '"'));
//times
$offset = strpos ($tmp$j], '; ') + 2;
$times$nentries = $the_date.substr ($tmp$j], $offset, LENGTH_TIMESTR).$wp_tmz;
//authors
//Anonymous edits result in different output; we must treat it specially
if ($entries$nentries != ANON_TITLETEXT) {
$offset = strpos ($tmp$j+1], ' title="'.$wp_userpage)
+ 8 + strlen ($wp_userpage);
$authors$nentries = substr ($tmp$j+1], $offset,
strpos (substr ($tmp$j+1], $offset), '"'));
}
else {
$offset = strpos ($tmp$j+1], ' title="'.ANON_TITLETEXT)
+ 8 + strlen (ANON_TITLETEXT) + 2;
$authors$nentries = substr ($tmp$j+1], $offset,
strpos (substr ($tmp$j+1], $offset), '<'));
}
//titles
$offset = strpos ($lines$j], ' title="') + 8;
$titles$nentries = substr ($lines$j], $offset,
strpos (substr ($lines$j], $offset), '"'));
$titles$nentries .= ' . . '. $authors$nentries];
$nentries++;
}
}
//debug
if ($printDebug) {
echo "links:\n"; print_r ($links);
echo "titles:\n"; print_r ($titles);
echo "descriptions:\n"; print_r ($descriptions);
echo "entries:\n"; print_r ($entries);
echo "times:\n"; print_r ($times);
echo "authors:\n"; print_r ($authors);
}
/********************************************************* RSS generation. */
$disallowed_xml = array ("&", "<", ">");
$replacements_xml = array ("&", "<", ">");
$output = '';
// header
$output .= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
$output .= "<!DOCTYPE rdf:RDF [\n";
$output .= "<!ENTITY % HTMLlat1 PUBLIC\n";
$output .= " \"-//W3C//ENTITIES Latin 1 for XHTML//EN\"\n";
$output .= " \"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent\">\n";
$output .= "]>\n";
$output .= "<rdf:RDF\n";
$output .= " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" \n";
$output .= " xmlns:sy=\"http://purl.org/rss/1.0/modules/syndication/\"\n";
$output .= " xmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n";
//$output .= " xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n";
$output .= " xmlns=\"http://purl.org/rss/1.0/\"\n";
$output .= ">\n";
// channel summary
$output .= " <channel rdf:about=\"http://"
.$wp_domain.str_replace ($disallowed_xml,
$replacements_xml,
$wp_watchlist)."\">\n";
$output .= " <title>$wp_title</title>\n";
$output .= " <link>http://"
.$wp_domain.str_replace ($disallowed_xml,
$replacements_xml,
$wp_watchlist)."</link>\n";
$output .= " <description>$wp_description</description>\n";
$output .= " <dc:source>http://"
.$wp_domain.str_replace ($disallowed_xml,
$replacements_xml,
$wp_watchlist)."</dc:source>\n";
$output .= " <dc:date>".date("Y-m-d\TH:iO")."</dc:date>\n";
$output .= " <sy:updatePeriod>hourly</sy:updatePeriod>\n";
$output .= " <sy:updateFrequency>4</sy:updateFrequency>\n";
$output .= " <sy:updateBase>1970-01-01T00:00+00:00</sy:updateBase>\n";
$output .= " <items>\n";
$output .= " <rdf:Seq>\n";
for ($i = 0; $i < $nentries; $i++)
{
$output .= " <rdf:li resource=\"http://$wp_domain/wiki/"
.urlencode(str_replace (" ", "_", $links$i]))."\" />\n";
}
$output .= " </rdf:Seq>\n";
$output .= " </items>\n";
$output .= "\n";
$output .= " </channel>\n";
// items
for ($i = 0; $i < $nentries; $i++)
{
$output .= " <item rdf:about=\"http://$wp_domain/wiki/"
.urlencode(str_replace (" ", "_", $links$i]))."\">\n";
$output .= " <title>".$titles$i."</title>\n";
$output .= " <description>{$descriptions$i}</description>\n";
$output .= " <dc:creator>".$authors$i."</dc:creator>\n";
$output .= " <dc:date>".$times$i."</dc:date>\n";
$output .= " </item>\n\n";
}
// footer
$output .= "</rdf:RDF>\n";
exit ($output);
?>