![]() | This page is an archive. Do not edit the contents of this page. Please direct any additional comments to the current main page. |
Continuation of the above, next phase on www.racingpost.com URLs (vs. bloodsport.racingpost.com)
Conversion types found:
Soft404 ("S404") examples to watch for:
Approach:
|publisher=
for any of the "_id" URLs or anything with /horse|jockey|owner|trainer|results/. Use |work=
for everything else.|publisher=
and |work=
to uniform values.-- Green C 03:54, 27 September 2020 (UTC)
http://bloodstock.racingpost.com/dam/dam_home.sd?horse_id=439081#damTabs=dam_progeny_sales
which are now located at
https://www.racingpost.com/profile/horse/439081/thimblerigger/progeny-sales
. In those cases, the horse name is required in the URL (with modifications - lowercasing, removal of special characters, conversion of embedded spaces to hyphens), and the horse referred to is not necessarily the one in the article title, so the work probably can't readily be done by bot. But a list of such URLs would be useful so they can be fixed by hand.
Colonies Chris (
talk)
20:38, 27 September 2020 (UTC)
#damTabs=dam_progeny_sales
); for that, the full url including the horse name is required and the parameter of dam_progeny_sales
has to change to a subfolder named progeny_sales
. Does the bot handle that?
Colonies Chris (
talk)
22:33, 27 September 2020 (UTC)
https://www.racingpost.com/profile/horse/439081/thimblerigger/progeny-sales
is a dead link (soft404).
https://www.racingpost.com/profile/horse/439081
opens a redirect to
https://www.racingpost.com/profile/horse/439081/thimblerigger/
which works which is what the bot uses. --
Green
C
23:32, 27 September 2020 (UTC)https://www.racingpost.com/?authme
--
Green
C
23:37, 27 September 2020 (UTC)There were 42 (!) URLs of this type that were converted:
-- Green C 23:53, 27 September 2020 (UTC)
I've worked out that for URLs of the form
http://www.racingpost.com/horses/result_home.sd?race_id=267296&r_date=1999-06-04&popup=yes#results_top_tabs=re_&results_bottom_tabs=ANALYSIS
, the correct destination is
https://www.racingpost.com/results/10/catterick/1999-06-04/267296
. The id and date can be derived from the source URL, but the course (here, "catterick") has to come from context and the digits after /results
are a bit of a mystery - I suspect they may correspond to the position in an alphabetical list of courses. I derived the destination URL by entering the date and course in the advanced search facility at
https://www.racingpost.com/results/
, but I can't see any way this could be automated, unfortunately.
Colonies Chris (
talk)
22:06, 28 September 2020 (UTC)
<title>Results from the 2.35 race at CATTERICK - 4 June 1999 | Racing Post</title>
. Then by extracting the
working links can determine there are 107 known combinations. For example:The numbers and names appear to be consistent, so now have a map. The final step is extracting the target name from the archive HTML title and finding it on the map. It probably won't always be an exact match so a fuzzy match might be required. I'll work on it hopefully this week. -- Green C 03:06, 29 September 2020 (UTC)
http://www.racingpost.com/horses/horse_home.sd?horse_id=642105#topHorseTabs=horse_race_record&bottomHorseTabs=horse_form
; this one seems to be fairly fixable - it should go to
https://www.racingpost.com/profile/horse/642105/sixties-icon/form
.
Colonies Chris (
talk)
09:18, 29 September 2020 (UTC)url=http://www.racingpost.com/horses/result_home.sd?race_id=243391&r_date=7 September 1997&popup=yes#results_top_tabs=re_&results_bottom_tabs=ANALYSIS
), so the bot has misinterpreted them. Otherwise, looking fine. I'm intrigued that the bot has rescued some citations from archives because they're not actually dead? (e.g. in
Istabraq).
Colonies Chris (
talk)
13:32, 1 October 2020 (UTC)
|archive-url=
- it's for
web archive URLs like web.archive.org or archive.today, sometimes they put the original URL thinking it automatically turns into an archive. But by taking the spot, it actually prevents bots from adding an archive URL (when the link dies). --
Green
C
13:40, 1 October 2020 (UTC)
work=[[Racing Post]]
, it seems to consistently add it between the |last=
and |first=
parameters of the author name. Not fundamentally a problem, of course, but a little strange from the point of view of some later editor. Could it be placed elsewhere?
Colonies Chris (
talk)
16:28, 1 October 2020 (UTC)
|newspaper=
then added |work=
- it should be programmed to add it following the |url=
, but there are some conditions (like when |url=
is the last argument) where it will instead add it following the first argument, and if the first argument is |first=
that's probably what happened. I'd have to see an example to know for sure what happened. --
Green
C
16:52, 1 October 2020 (UTC)
Extended content
|
---|
aqueduct 255 arlington-park 276 auteuil 205 baden-baden 207 ballingarry ? bangor-on-dee 4 brighton 7 cagnes-sur-mer 216 camden ? cartmel 9 caulfield 469 chester 13 churchill-downs 308 clonmel 177 cologne 226 compiegne 291 delaware-park 248 del-mar 444 delta-downs ? doha 1196 doomben 467 downpatrick 179 down-royal 180 dusseldorf 240 ellis-park 638 evry ? exeter 14 fair-grounds 742 fair-hill ? fakenham 18 flemington 297 folkestone 19 frankfurt 231 hawthorne 604 hollywood-park ? hoppegarten 440 huntingdon 26 kenilworth 508 kranji 794 la-zarzuela 449 le-lion-d'angers 313 lone-star-park 674 los-alamitos 1307 ludlow 34 lyon-parilly 541 market-rasen 35 monmouth-park 253 moonee-valley 299 musselburgh 16 nad-al-sheba 483 nancy 559 newton-abbot 39 parx 578 pimlico 221 pisa 284 plumpton 44 prairie-meadows 808 quakerstown ? randwick 471 rosehill 311 saint-brieuc 713 santa-anita 257 saratoga 445 sedgefield 57 southwell 61 stratford 67 taby 271 tampa-bay-downs 724 taunton 73 towcester 83 turin ? uttoxeter 84 wincanton 90 wissembourg ? worcester 101 |
If an ID number was found for each the bot could convert those URLs from archived to live. Probably by searching for the course on the website. -- Green C 01:45, 2 October 2020 (UTC)
{{
dead link}}
-- Green C 01:45, 2 October 2020 (UTC)
https://www.racingpost.com/results/
, but it's often necessary to pick from several results, so that will have to be a manual process. Fortunately there aren't many of those. I suspect the 'story=' ones are gone completely and will just have to rely on archived versions.
Colonies Chris (
talk)
20:01, 5 October 2020 (UTC)race_id
ones, and found them all so far: e.g.
http://www.racingpost.com/horses/result_home.sd?race_id=602790
-->
https://www.racingpost.com/results/231/frankfurt/2014-05-11/602790
, but that conversion only works if the racecourse name is available to the bot, so I suppose it'll just have to be hand-fixing for those.
Colonies Chris (
talk)
09:46, 7 October 2020 (UTC)
r_date=
--
Green
C
13:51, 7 October 2020 (UTC)When TV by the Numbers was defunct this past January, all of their TV by the Numbers ratings urls became dead urls. The main url: https://tvbythenumbers.zap2it.com now just redirect to https://tvlistings.zap2it.com/?aid=gapzap (just the TV Listings). For an example, http://tvbythenumbers.zap2it.com/2016/09/22/wednesday-final-ratings-sept-21-2016 redirects to https://tvlistings.zap2it.com/?aid=gapzap (just the TV Listings). Is it possible for a bot to fix this problem? The dead urls of TV by the Numbers affect a lot of American television series articles. — YoungForever (talk) 21:54, 25 October 2020 (UTC)
Results
Completed:
{{
dead link}}
(list avail on request){{
TV by the Numbers}}
to square links with archives.@ YoungForever: If you see anything it missed let me know. Good find, 43k is a lot. -- Green C 23:29, 26 October 2020 (UTC)
![]() | This page is an archive. Do not edit the contents of this page. Please direct any additional comments to the current main page. |
Continuation of the above, next phase on www.racingpost.com URLs (vs. bloodsport.racingpost.com)
Conversion types found:
Soft404 ("S404") examples to watch for:
Approach:
|publisher=
for any of the "_id" URLs or anything with /horse|jockey|owner|trainer|results/. Use |work=
for everything else.|publisher=
and |work=
to uniform values.-- Green C 03:54, 27 September 2020 (UTC)
http://bloodstock.racingpost.com/dam/dam_home.sd?horse_id=439081#damTabs=dam_progeny_sales
which are now located at
https://www.racingpost.com/profile/horse/439081/thimblerigger/progeny-sales
. In those cases, the horse name is required in the URL (with modifications - lowercasing, removal of special characters, conversion of embedded spaces to hyphens), and the horse referred to is not necessarily the one in the article title, so the work probably can't readily be done by bot. But a list of such URLs would be useful so they can be fixed by hand.
Colonies Chris (
talk)
20:38, 27 September 2020 (UTC)
#damTabs=dam_progeny_sales
); for that, the full url including the horse name is required and the parameter of dam_progeny_sales
has to change to a subfolder named progeny_sales
. Does the bot handle that?
Colonies Chris (
talk)
22:33, 27 September 2020 (UTC)
https://www.racingpost.com/profile/horse/439081/thimblerigger/progeny-sales
is a dead link (soft404).
https://www.racingpost.com/profile/horse/439081
opens a redirect to
https://www.racingpost.com/profile/horse/439081/thimblerigger/
which works which is what the bot uses. --
Green
C
23:32, 27 September 2020 (UTC)https://www.racingpost.com/?authme
--
Green
C
23:37, 27 September 2020 (UTC)There were 42 (!) URLs of this type that were converted:
-- Green C 23:53, 27 September 2020 (UTC)
I've worked out that for URLs of the form
http://www.racingpost.com/horses/result_home.sd?race_id=267296&r_date=1999-06-04&popup=yes#results_top_tabs=re_&results_bottom_tabs=ANALYSIS
, the correct destination is
https://www.racingpost.com/results/10/catterick/1999-06-04/267296
. The id and date can be derived from the source URL, but the course (here, "catterick") has to come from context and the digits after /results
are a bit of a mystery - I suspect they may correspond to the position in an alphabetical list of courses. I derived the destination URL by entering the date and course in the advanced search facility at
https://www.racingpost.com/results/
, but I can't see any way this could be automated, unfortunately.
Colonies Chris (
talk)
22:06, 28 September 2020 (UTC)
<title>Results from the 2.35 race at CATTERICK - 4 June 1999 | Racing Post</title>
. Then by extracting the
working links can determine there are 107 known combinations. For example:The numbers and names appear to be consistent, so now have a map. The final step is extracting the target name from the archive HTML title and finding it on the map. It probably won't always be an exact match so a fuzzy match might be required. I'll work on it hopefully this week. -- Green C 03:06, 29 September 2020 (UTC)
http://www.racingpost.com/horses/horse_home.sd?horse_id=642105#topHorseTabs=horse_race_record&bottomHorseTabs=horse_form
; this one seems to be fairly fixable - it should go to
https://www.racingpost.com/profile/horse/642105/sixties-icon/form
.
Colonies Chris (
talk)
09:18, 29 September 2020 (UTC)url=http://www.racingpost.com/horses/result_home.sd?race_id=243391&r_date=7 September 1997&popup=yes#results_top_tabs=re_&results_bottom_tabs=ANALYSIS
), so the bot has misinterpreted them. Otherwise, looking fine. I'm intrigued that the bot has rescued some citations from archives because they're not actually dead? (e.g. in
Istabraq).
Colonies Chris (
talk)
13:32, 1 October 2020 (UTC)
|archive-url=
- it's for
web archive URLs like web.archive.org or archive.today, sometimes they put the original URL thinking it automatically turns into an archive. But by taking the spot, it actually prevents bots from adding an archive URL (when the link dies). --
Green
C
13:40, 1 October 2020 (UTC)
work=[[Racing Post]]
, it seems to consistently add it between the |last=
and |first=
parameters of the author name. Not fundamentally a problem, of course, but a little strange from the point of view of some later editor. Could it be placed elsewhere?
Colonies Chris (
talk)
16:28, 1 October 2020 (UTC)
|newspaper=
then added |work=
- it should be programmed to add it following the |url=
, but there are some conditions (like when |url=
is the last argument) where it will instead add it following the first argument, and if the first argument is |first=
that's probably what happened. I'd have to see an example to know for sure what happened. --
Green
C
16:52, 1 October 2020 (UTC)
Extended content
|
---|
aqueduct 255 arlington-park 276 auteuil 205 baden-baden 207 ballingarry ? bangor-on-dee 4 brighton 7 cagnes-sur-mer 216 camden ? cartmel 9 caulfield 469 chester 13 churchill-downs 308 clonmel 177 cologne 226 compiegne 291 delaware-park 248 del-mar 444 delta-downs ? doha 1196 doomben 467 downpatrick 179 down-royal 180 dusseldorf 240 ellis-park 638 evry ? exeter 14 fair-grounds 742 fair-hill ? fakenham 18 flemington 297 folkestone 19 frankfurt 231 hawthorne 604 hollywood-park ? hoppegarten 440 huntingdon 26 kenilworth 508 kranji 794 la-zarzuela 449 le-lion-d'angers 313 lone-star-park 674 los-alamitos 1307 ludlow 34 lyon-parilly 541 market-rasen 35 monmouth-park 253 moonee-valley 299 musselburgh 16 nad-al-sheba 483 nancy 559 newton-abbot 39 parx 578 pimlico 221 pisa 284 plumpton 44 prairie-meadows 808 quakerstown ? randwick 471 rosehill 311 saint-brieuc 713 santa-anita 257 saratoga 445 sedgefield 57 southwell 61 stratford 67 taby 271 tampa-bay-downs 724 taunton 73 towcester 83 turin ? uttoxeter 84 wincanton 90 wissembourg ? worcester 101 |
If an ID number was found for each the bot could convert those URLs from archived to live. Probably by searching for the course on the website. -- Green C 01:45, 2 October 2020 (UTC)
{{
dead link}}
-- Green C 01:45, 2 October 2020 (UTC)
https://www.racingpost.com/results/
, but it's often necessary to pick from several results, so that will have to be a manual process. Fortunately there aren't many of those. I suspect the 'story=' ones are gone completely and will just have to rely on archived versions.
Colonies Chris (
talk)
20:01, 5 October 2020 (UTC)race_id
ones, and found them all so far: e.g.
http://www.racingpost.com/horses/result_home.sd?race_id=602790
-->
https://www.racingpost.com/results/231/frankfurt/2014-05-11/602790
, but that conversion only works if the racecourse name is available to the bot, so I suppose it'll just have to be hand-fixing for those.
Colonies Chris (
talk)
09:46, 7 October 2020 (UTC)
r_date=
--
Green
C
13:51, 7 October 2020 (UTC)When TV by the Numbers was defunct this past January, all of their TV by the Numbers ratings urls became dead urls. The main url: https://tvbythenumbers.zap2it.com now just redirect to https://tvlistings.zap2it.com/?aid=gapzap (just the TV Listings). For an example, http://tvbythenumbers.zap2it.com/2016/09/22/wednesday-final-ratings-sept-21-2016 redirects to https://tvlistings.zap2it.com/?aid=gapzap (just the TV Listings). Is it possible for a bot to fix this problem? The dead urls of TV by the Numbers affect a lot of American television series articles. — YoungForever (talk) 21:54, 25 October 2020 (UTC)
Results
Completed:
{{
dead link}}
(list avail on request){{
TV by the Numbers}}
to square links with archives.@ YoungForever: If you see anything it missed let me know. Good find, 43k is a lot. -- Green C 23:29, 26 October 2020 (UTC)