Aim:
A software program that allows quick offline reviewing of a number of articles and editors contributions, to gain a sense of how matters have developed in an editing dispute, to be able to quickly scan for "who did what when", and its companion question, "where else was it done, by whom", and so on.
Also useful to quickly get diffs for these for use in cases.
In case others use different terms :)
A lot of data, sadly. And because it's a big job I don't honestly mind if it runs in the background grabbing data at a civilized rate for a few hours or overnight, that's fine by me. Wouldn't want to overload the server.
The rough outline is that it pulls down all relevant histories and contrib lists, and (initially) the most recent of these DIFFS. There could be easily 5k or 15k of DIFFs (users may have 5k edits, talk pages could easily have a thousand or so), so thats why it grabs all the DIFF information, but (initially) only a selection of the actual DIFFS and page contents.
It then pulls down other diffs on demand ("click to get the DIFF on this edit"), and save these in its DB (if not already held). The list of diffs and the actual DIFF can then be (fairly simplistically) displayed and scanned visually, or sorted and filtered.
As an additional function it also allows quick selection and display of "diff between two points", so that one can select any two edits and it'll grab the diff between them into its database too and display it. This shows what effect a bunch of edits have had combined.
MySQL or MSAccess. Access is actually pretty good for me, but if others use it MySQL may be more sensible. Try Access 1st to test the usefulness :)
Also information on history of some kind, allowing one to skip to different views or filters to double check stuff.
The program accepts:
For the articles and named editors -- the program first their entire edit history or contribs records. Not all pages or diffs will be initially pulled down, but even for those not pulled, the edit info for all DIFFs is grabbed (even if the diff itself isn't pulled from the server).
For each of these that's within the date or edit count range, it also grabs the markup for the revision, and the rendered DIFF HTML and rendered article, from the DIFF page, and populates the two caches with these for all DIFFS and edit IDs that it has pulled.
It also grabs the logs for the named editors. (User uploads, user page moves, admin page protects, admin page deletes, admin user blocks, and block logs)
Upon completing the above load, the program "knows" about all relevant edits. For many of them it also has the wiki-markup, the rendered HTML, and the formatted DIFF from the previous version. Those it doesnt have, it will load on demand if the DB entry is empty.
It also has somewhere a copy of the header and footer of a typical current DIFF page, so that the HTML chunks can be re-rendered at will.
Filter/sort (by editor, date, and text search via manually entered SQL "WHERE" expression, or by completely manual SQL WHERE clause) - past filters and sorts remembered and listed for quick recall.
Split screen with a listbox of editIDs at the top, the (selectable) diff / wikimarkup / rendered HTML in the bottom half, and the edit ID info (URL, author, date, id# etc) in a line at the bottom for easy copying.
Purpose - allows scrolling through selected diffs quickly, with display in any of (markup/html/DIFF) in the bottom panel. if these aren't cached they are grabbed as needed at the time.
The DIFF list also allows multiple DIFF selection - clicking a button if 2 DIFFs are selected grabs the DIFF between those 2 versions (if not already cached) and displays that, until the selection is changed.
For a set of selected editIDs in the listbox, create a single view showing DIFFs stacked one after the other with a heavy line in between, to allow single page review (and normal text search) of all the selected DIFFs on one page. Clicking on a diff pulls up the markup or rendered text for the editID concerned.
Aim:
A software program that allows quick offline reviewing of a number of articles and editors contributions, to gain a sense of how matters have developed in an editing dispute, to be able to quickly scan for "who did what when", and its companion question, "where else was it done, by whom", and so on.
Also useful to quickly get diffs for these for use in cases.
In case others use different terms :)
A lot of data, sadly. And because it's a big job I don't honestly mind if it runs in the background grabbing data at a civilized rate for a few hours or overnight, that's fine by me. Wouldn't want to overload the server.
The rough outline is that it pulls down all relevant histories and contrib lists, and (initially) the most recent of these DIFFS. There could be easily 5k or 15k of DIFFs (users may have 5k edits, talk pages could easily have a thousand or so), so thats why it grabs all the DIFF information, but (initially) only a selection of the actual DIFFS and page contents.
It then pulls down other diffs on demand ("click to get the DIFF on this edit"), and save these in its DB (if not already held). The list of diffs and the actual DIFF can then be (fairly simplistically) displayed and scanned visually, or sorted and filtered.
As an additional function it also allows quick selection and display of "diff between two points", so that one can select any two edits and it'll grab the diff between them into its database too and display it. This shows what effect a bunch of edits have had combined.
MySQL or MSAccess. Access is actually pretty good for me, but if others use it MySQL may be more sensible. Try Access 1st to test the usefulness :)
Also information on history of some kind, allowing one to skip to different views or filters to double check stuff.
The program accepts:
For the articles and named editors -- the program first their entire edit history or contribs records. Not all pages or diffs will be initially pulled down, but even for those not pulled, the edit info for all DIFFs is grabbed (even if the diff itself isn't pulled from the server).
For each of these that's within the date or edit count range, it also grabs the markup for the revision, and the rendered DIFF HTML and rendered article, from the DIFF page, and populates the two caches with these for all DIFFS and edit IDs that it has pulled.
It also grabs the logs for the named editors. (User uploads, user page moves, admin page protects, admin page deletes, admin user blocks, and block logs)
Upon completing the above load, the program "knows" about all relevant edits. For many of them it also has the wiki-markup, the rendered HTML, and the formatted DIFF from the previous version. Those it doesnt have, it will load on demand if the DB entry is empty.
It also has somewhere a copy of the header and footer of a typical current DIFF page, so that the HTML chunks can be re-rendered at will.
Filter/sort (by editor, date, and text search via manually entered SQL "WHERE" expression, or by completely manual SQL WHERE clause) - past filters and sorts remembered and listed for quick recall.
Split screen with a listbox of editIDs at the top, the (selectable) diff / wikimarkup / rendered HTML in the bottom half, and the edit ID info (URL, author, date, id# etc) in a line at the bottom for easy copying.
Purpose - allows scrolling through selected diffs quickly, with display in any of (markup/html/DIFF) in the bottom panel. if these aren't cached they are grabbed as needed at the time.
The DIFF list also allows multiple DIFF selection - clicking a button if 2 DIFFs are selected grabs the DIFF between those 2 versions (if not already cached) and displays that, until the selection is changed.
For a set of selected editIDs in the listbox, create a single view showing DIFFs stacked one after the other with a heavy line in between, to allow single page review (and normal text search) of all the selected DIFFs on one page. Clicking on a diff pulls up the markup or rendered text for the editID concerned.