This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later. |
The assertion that "the addition of a surrogate key will slow down access to the table, particularly if it is indexed" should be removed. It's not substantiated, incomplete, and misleading. It has no place in theoretical performance computations and in practice is almost always incorrect anyway. Finally, it's not a part of the definition.
I agree. An index on a surrogate key will be smaller (and therefore could be faster) than an index on a natural key. — Preceding unsigned comment added by 198.204.133.208 ( talk) 14:14, 25 April 2006 (UTC)
Some Anonymous Hero's decided that it's better to throw out everybody else's work and dump in some essay they wrote themselves. I'm marking this for wikification, and I'm going to wikify and merge in some of the old content if it's relevant when I've more time. -- Kgaughan 19:35, 13 December 2005 (UTC)
This article needs attention. The current posting is correct, though quite poorly written. It's certainly relational-centric.
The old article seems to confuse system-assigned keys with surrogate keys, particularly in this paragraph: ""Surrogate key" may also be known as "System-generated key", "Database Sequence number", "Synthetic key", "Technical key" or an "Arbitrary, unique identifier"."
I don't think either version thinks of "surrogate keys" from the point of view of data warehousing, though.
-- 03:01, 31 December 2005 (UTC) — Preceding unsigned comment added by Mikeblas ( talk • contribs) 03:00, 31 December 2005 (UTC)
Surely a surrogate key and a primary key are identical? Is there really any difference? — Preceding unsigned comment added by 82.198.250.9 ( talk) 07:13, 11 September 2006 (UTC)
I've added a summary of the main advantages and disadvantages of Surrogate Keys. I've also linked my own blog article; I realize that linking one's on blog is in poor taste, so if someone else has an article which extensively catalogs the disadvantages of surrogate keys, then please link it instead. Jberkus 19:07, 23 September 2006 (UTC)
Isn't surrogate keys more applied in the data warehousing for relating fact and dimension tables. Surrogate keys are also one of the solution for the " Slowly Changing Dimension" problem. I strongly feel this has to be included along with the following external link [1]. —The preceding unsigned comment was added by 203.123.182.27 ( talk) 05:19, 2 February 2007 (UTC).
"several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate, depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operation and object-to-row mapping."
Hibernate works fine with non-surrogate key, even with composite keys, see [2]. I know little about Ruby on Rails, but I very much doubt that they require surrogate keys.
"However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection." Really ? Isn't the point of some random generation schemes (random 128-bit UUIDs) to make the test redundant by making the collision probability definitely negligible ? Also can't the collision, if need be, be checked after rejection by the uniqueness constraint ? --FvdP ( talk) 16:12, 9 May 2008 (UTC)
The following subsection was added to the Disadvantages section: "For some queries, the existence of a surrogate key may result in the need for more |joins in order to retrieve the needed information. E.g., if a table of telephone numbers uses a surrogate key (some counter) instead of the natural key (the telephone number), referencing tables need to be joined with the telephone number table, in order to retrieve actual telephone numbers." I moved it here because it needs some more thought. This disadvantage is not a surrogate/natural key choice; instead, it is an over-normalization. See Database normalization. Timhowardriley ( talk) 22:05, 26 January 2009 (UTC)
The Query Optimization disadvantage was changed to read:
However, it originally read:
I don't see the added clarity. Actually, "slow down insert, and slow down deletes" is more clear than "slow down data-modifying operations." Also, why would you put an additional index on the surrogate key? It already has a unique index on it. Timhowardriley ( talk) 22:14, 26 January 2009 (UTC)
I ought to have read the discussion page before making edits, but I didn't. I also out to have logged on first - the edits are under ip address 124.178.225.104 . I have made a few changes where I thought the wording was a trifle clumsy, although I have tries to keep what I think the intended meaning was ("dissassociated" isn't a very good word, but I now what you mean).
I've also added a bit here and there.
The stuff on the caveats relating to randomly generated keys needs to be moved out, as it is duplicated. —Preceding unsigned comment added by Paul Murray ( talk • contribs) 11:13, 26 March 2009 (UTC)
The inadvertant assumption - that surrogate keys have meaning - can be solved by repeating "surrogate keys have _no_ meaning" in meetings. Also by peer reviewing for code that uses surrogate key values in anything other than joins. — Preceding unsigned comment added by 115.186.240.40 ( talk) 03:26, 10 February 2010 (UTC)
I removed this text. "Although applications accessing the database should be free of bugs and never attempt to create rows with duplicate natural keys, nevertheless properly maintained unique keys are a vital "last line of defense" against this type of data corruption. This is particularly the case where data is sometimes mutated by user-entered SQL. Of course, due to the very fact that a well-tested and bedded down application will not attempt to introduce duplicate keys, the absence of database constraints is often not immediately apparent as a problem. Modern databases, however, can apply database triggers to enforce unique constraints." It's meaning should be more clear. I'm not sure if it refutes or supports the previous sentence that says, "Without a unique index on the natural key, duplicate rows can appear and once present can be difficult to identify." Timhowardriley ( talk) 19:16, 13 May 2011 (UTC)
The example given in the Requirements changes section is not an argument for Oracle sequences or similar approaches, because those sequences would have been independent in the two separate companies as well (unless they used a shared database such as a clouid-hosted seevice). However, it seems like a good example for why to use UUIDs. Any objections to noting this?-- greenrd ( talk) 09:36, 2 February 2012 (UTC)
Is Definition (1) from P. Hall, J. Owlett, S. Todd, ``Relations and Entities, in ``Modeling in Data Base Management Systems, G.M. Nijssen (ed.), pp 201-220, North-Holland, 1976? — Preceding unsigned comment added by 91.37.175.230 ( talk) 08:04, 25 October 2012 (UTC)
"For example, table Staff may contain two rows for "John Smith", one row when he was employed between 1990 and 1999, another row when he was employed between 2001 and 2006.[] The surrogate key is identical (non-unique) in both rows however the primary key will be unique."
Above explanation from the article is the exact opposite of how Kimballl & Ross explain surrogate keys:
"We need two different product surrogate keys for the same SKU or physical barcode. Each of the separate surrogate keys identifies a unique product attribute profile that was true for a span of time."(Kimball & Ross 2002, The Data Warehouse Toolkit, p. 97)
Also how can a single surrogate key exist twice within one table if implemented via IDENTITY, AUTOINCREMENT, etc. as described? — Preceding unsigned comment added by At710 ( talk • contribs) 10:23, 9 October 2013 (UTC)
This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later. |
The assertion that "the addition of a surrogate key will slow down access to the table, particularly if it is indexed" should be removed. It's not substantiated, incomplete, and misleading. It has no place in theoretical performance computations and in practice is almost always incorrect anyway. Finally, it's not a part of the definition.
I agree. An index on a surrogate key will be smaller (and therefore could be faster) than an index on a natural key. — Preceding unsigned comment added by 198.204.133.208 ( talk) 14:14, 25 April 2006 (UTC)
Some Anonymous Hero's decided that it's better to throw out everybody else's work and dump in some essay they wrote themselves. I'm marking this for wikification, and I'm going to wikify and merge in some of the old content if it's relevant when I've more time. -- Kgaughan 19:35, 13 December 2005 (UTC)
This article needs attention. The current posting is correct, though quite poorly written. It's certainly relational-centric.
The old article seems to confuse system-assigned keys with surrogate keys, particularly in this paragraph: ""Surrogate key" may also be known as "System-generated key", "Database Sequence number", "Synthetic key", "Technical key" or an "Arbitrary, unique identifier"."
I don't think either version thinks of "surrogate keys" from the point of view of data warehousing, though.
-- 03:01, 31 December 2005 (UTC) — Preceding unsigned comment added by Mikeblas ( talk • contribs) 03:00, 31 December 2005 (UTC)
Surely a surrogate key and a primary key are identical? Is there really any difference? — Preceding unsigned comment added by 82.198.250.9 ( talk) 07:13, 11 September 2006 (UTC)
I've added a summary of the main advantages and disadvantages of Surrogate Keys. I've also linked my own blog article; I realize that linking one's on blog is in poor taste, so if someone else has an article which extensively catalogs the disadvantages of surrogate keys, then please link it instead. Jberkus 19:07, 23 September 2006 (UTC)
Isn't surrogate keys more applied in the data warehousing for relating fact and dimension tables. Surrogate keys are also one of the solution for the " Slowly Changing Dimension" problem. I strongly feel this has to be included along with the following external link [1]. —The preceding unsigned comment was added by 203.123.182.27 ( talk) 05:19, 2 February 2007 (UTC).
"several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate, depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operation and object-to-row mapping."
Hibernate works fine with non-surrogate key, even with composite keys, see [2]. I know little about Ruby on Rails, but I very much doubt that they require surrogate keys.
"However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection." Really ? Isn't the point of some random generation schemes (random 128-bit UUIDs) to make the test redundant by making the collision probability definitely negligible ? Also can't the collision, if need be, be checked after rejection by the uniqueness constraint ? --FvdP ( talk) 16:12, 9 May 2008 (UTC)
The following subsection was added to the Disadvantages section: "For some queries, the existence of a surrogate key may result in the need for more |joins in order to retrieve the needed information. E.g., if a table of telephone numbers uses a surrogate key (some counter) instead of the natural key (the telephone number), referencing tables need to be joined with the telephone number table, in order to retrieve actual telephone numbers." I moved it here because it needs some more thought. This disadvantage is not a surrogate/natural key choice; instead, it is an over-normalization. See Database normalization. Timhowardriley ( talk) 22:05, 26 January 2009 (UTC)
The Query Optimization disadvantage was changed to read:
However, it originally read:
I don't see the added clarity. Actually, "slow down insert, and slow down deletes" is more clear than "slow down data-modifying operations." Also, why would you put an additional index on the surrogate key? It already has a unique index on it. Timhowardriley ( talk) 22:14, 26 January 2009 (UTC)
I ought to have read the discussion page before making edits, but I didn't. I also out to have logged on first - the edits are under ip address 124.178.225.104 . I have made a few changes where I thought the wording was a trifle clumsy, although I have tries to keep what I think the intended meaning was ("dissassociated" isn't a very good word, but I now what you mean).
I've also added a bit here and there.
The stuff on the caveats relating to randomly generated keys needs to be moved out, as it is duplicated. —Preceding unsigned comment added by Paul Murray ( talk • contribs) 11:13, 26 March 2009 (UTC)
The inadvertant assumption - that surrogate keys have meaning - can be solved by repeating "surrogate keys have _no_ meaning" in meetings. Also by peer reviewing for code that uses surrogate key values in anything other than joins. — Preceding unsigned comment added by 115.186.240.40 ( talk) 03:26, 10 February 2010 (UTC)
I removed this text. "Although applications accessing the database should be free of bugs and never attempt to create rows with duplicate natural keys, nevertheless properly maintained unique keys are a vital "last line of defense" against this type of data corruption. This is particularly the case where data is sometimes mutated by user-entered SQL. Of course, due to the very fact that a well-tested and bedded down application will not attempt to introduce duplicate keys, the absence of database constraints is often not immediately apparent as a problem. Modern databases, however, can apply database triggers to enforce unique constraints." It's meaning should be more clear. I'm not sure if it refutes or supports the previous sentence that says, "Without a unique index on the natural key, duplicate rows can appear and once present can be difficult to identify." Timhowardriley ( talk) 19:16, 13 May 2011 (UTC)
The example given in the Requirements changes section is not an argument for Oracle sequences or similar approaches, because those sequences would have been independent in the two separate companies as well (unless they used a shared database such as a clouid-hosted seevice). However, it seems like a good example for why to use UUIDs. Any objections to noting this?-- greenrd ( talk) 09:36, 2 February 2012 (UTC)
Is Definition (1) from P. Hall, J. Owlett, S. Todd, ``Relations and Entities, in ``Modeling in Data Base Management Systems, G.M. Nijssen (ed.), pp 201-220, North-Holland, 1976? — Preceding unsigned comment added by 91.37.175.230 ( talk) 08:04, 25 October 2012 (UTC)
"For example, table Staff may contain two rows for "John Smith", one row when he was employed between 1990 and 1999, another row when he was employed between 2001 and 2006.[] The surrogate key is identical (non-unique) in both rows however the primary key will be unique."
Above explanation from the article is the exact opposite of how Kimballl & Ross explain surrogate keys:
"We need two different product surrogate keys for the same SKU or physical barcode. Each of the separate surrogate keys identifies a unique product attribute profile that was true for a span of time."(Kimball & Ross 2002, The Data Warehouse Toolkit, p. 97)
Also how can a single surrogate key exist twice within one table if implemented via IDENTITY, AUTOINCREMENT, etc. as described? — Preceding unsigned comment added by At710 ( talk • contribs) 10:23, 9 October 2013 (UTC)