From Wikipedia, the free encyclopedia
Content deleted Content added
→‎See also: Deleted three "see also" references as irrelevant
No edit summary
Line 1: Line 1:
{{Unreferenced|date=October 2008}}
{{Unreferenced|date=October 2008}}


{{dablink|This article is principally about managing and structuring the collections of data held on computers. For a fuller discussion of DBMS software, see [[Database management system]].}}
{{dablink|This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand kills babies For a fuller discussion of DBMS software, see [[Database management system]].}}


A '''Computer Database''' is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a [[database model]]. The model in most common use today is the [[relational model]]. Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models).
A '''Computer Database''' is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a [[database model]]. The model in most common use today is the [[relational model]]. Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models).
Line 11: Line 11:
==Database management systems==
==Database management systems==
{{main|Database management system}}
{{main|Database management system}}
Donate Now »
[Expand]
Support Wikipedia: a non-profit project
Donate Now »
Editing Database
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Preview

Remember that this is only a preview; your changes have not yet been saved!
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008)
This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand is apornstar For a fuller discussion of DBMS software, see Database management system.

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.


Contents
[hide]

* 1 Database management systems
o 1.1 Relational database management systems
o 1.2 Post-relational database models
o 1.3 Object database models
* 2 DBMS internals
o 2.1 Storage and physical database design
+ 2.1.1 Indexing
o 2.2 Transactions and concurrency
o 2.3 Replication
o 2.4 Security
o 2.5 Locking
o 2.6 Architecture
* 3 Applications of databases
* 4 Links to DBMS products
* 5 See also
* 6 References
* 7 External links

Database management systems

Main article: Database management system

Relational database management systems

An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples) Therefore, there are no explicit pointers between related tables.

Post-relational database models

Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

DBMS internals

Storage and physical database design

Main article: Database storage structures

Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:

* Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
* Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
* Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
* Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

* Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
* Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
* Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

* Access control ensures and restricts who can connect and what can be done to the database.
* Auditing logs what action or change has been performed, when and by whom.
* Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.[2]

Locking
Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data.

Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.)

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.

Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).

Also there are other types of database which cannot be classified as relational databases

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

For example suppliers database contains the data relating to suppliers such as;

* supplier name
* supplier code
* supplier address

It is often used by schools to teach students and grade them.

Links to DBMS products

Main article: :Category:Database management systems

* 4D
* ADABAS
* Alpha Five
* Apache Derby (Java, also known as IBM Cloudscape and Sun Java DB)
* BerkeleyDB
* CouchDB
* CSQL
* Datawasp
* Db4objects
* dBase
* FileMaker
* Firebird (database server)
* H2 (Java)
* Hsqldb (Java)
* IBM DB2
* IBM IMS (Information Management System)
* IBM UniVerse
* Informix
* Ingres
* Interbase
* InterSystems Caché
* MaxDB (formerly SapDB)
* Microsoft Access
* Microsoft SQL Server
* Model 204
* MySQL
* Nomad
* Objectivity/DB
* ObjectStore
* OpenLink Virtuoso
* OpenOffice.org Base
* Oracle Database
* Paradox (database)
* Polyhedra DBMS
* PostgreSQL
* Progress 4GL
* RDM Embedded
* ScimoreDB
* Sedna
* SQLite
* Superbase
* Sybase
* Teradata
* Vertica
* Visual FoxPro

See also

* Comparison of relational database management systems
* Comparison of database tools
* Database-centric architecture
* Database theory
* Government database
* Online database
* Real time database

References

Notes

1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
2. ^ Information Commissioner's Office - ICO

Bibliography

* Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
* Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
* Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
* Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
* Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
* Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
* Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
* Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
* Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
* Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
* Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

External links

* comp.databases.theory (Database Theory Discussion Group)
* Web page about FSQL: References and links about FSQL
* Increase Database Performance
* Database discussion forums
* The EM-DAT International Disaster Database
* The CE-DAT Complex Emergency Database

[hide]
v • d • e
Database management systems
Database models · Database normalization · Database storage · Distributed DBMS · Referential integrity · Relational algebra · Relational calculus · Relational database · Relational DBMS · Relational model · Object-relational database · Transaction processing
Concepts
Database · ACID · CRUD · Null · Candidate key · Foreign key · Primary key · Superkey · Surrogate key
Objects
Trigger · View · Table · Cursor · Log · Transaction · Index · Stored procedure · Partition
SQL
Select · Insert · Update · Merge · Delete · Join · Union · Create · Drop · Begin work · Commit · Rollback · Truncate · Alter · XSQL
Components
Concurrency control · Data dictionary · JDBC · ODBC · Query language · Query optimizer · Query plan
Database products: Object-oriented (comparison) · Relational (comparison) · Document-oriented
Bold textItalic textInternal linkExternal link (remember http:// prefix)Level 2 headlineEmbedded fileFile linkMathematical formula (LaTeX)Ignore wiki formattingYour signature with timestampHorizontal line (use sparingly)RedirectStrikeLine breakSuperscriptSubscriptSmallInsert hidden CommentInsert a picture galleryInsert block of quoted textInsert a tableInsert a reference
Anti-spam check. Do NOT fill this in!
{{Unreferenced|date=October 2008}} {{dablink|This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand is apornstar For a fuller discussion of DBMS software, see [[Database management system]].}} A '''Computer Database''' is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a [[database model]]. The model in most common use today is the [[relational model]]. Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models). A '''[[computer]] [[database]]''' relies upon [[software]] to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the [[database model]] that they support. The model tends to determine the [[query languages]] that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products. ==Database management systems== {{main|Database management system}} ===Relational database management systems=== An RDBMS implements the features of the relational model outlined above. In this context, [[Christopher J. Date|Date]]'s '''Information Principle''' states: <blockquote>The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations ([[tuple]]s) Therefore, there are no explicit pointers between related tables.</blockquote> ===Post-relational database models=== Several products have been identified as [[post-relational]] because the data model incorporates [[relations]] but is not constrained by the Information Principle, requiring that all information is represented by [[data values]] in relations. Products using a post-relational data model typically employ a model that actually pre-dates the [[relational model]]. These might be identified as a [[directed graph]] with [[tree data structure|trees]] on the [[data structure|nodes]]. Examples of models that could be classified as post-relational are [[Pick operating system|PICK]] aka [[Multidimensional database|MultiValue]], and [[MUMPS]]. ===Object database models=== In recent years, the [[object-oriented]] paradigm has been applied to database technology, creating a new programming model known as [[object database]]s. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same [[type system]] as the application program. This aims to avoid the overhead (sometimes referred to as the ''[[Object-Relational impedance mismatch|impedance mismatch]]'') of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as [[encapsulation]] and [[polymorphism (computer science)|polymorphism]], into the world of databases. A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program [[Persistence (computer science)|persistent]]. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities. ==DBMS internals== ===Storage and physical database design=== {{main|Database storage structures}} {{Expand-section|date=June 2008}} Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered [[flat file database|flat files]], [[ISAM]], [[heap (data structure)|heaps]], [[hash table|hash buckets]] or [[B+ tree]]s. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM. Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. <ref name="Physical Database Design">S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896 </ref> ====Indexing==== All of these databases can take advantage of [[Index (database)|indexing]] to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as [[B-tree]]s, [[hash table|hash]]es, and [[linked lists]]). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required. Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an [[SQL]] statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with [[join (SQL)|joins]] are [[nested loop join]], [[sort-merge join]] and [[hash join]]. Which of these is chosen depends on whether an index exists, what type it is, and its [[Cardinality (SQL statements)|cardinality]]. An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.) A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably. ===Transactions and concurrency=== In addition to their data model, most practical databases ("transactional databases") attempt to enforce a [[database transaction]] . Ideally, the database software should enforce the [[ACID]] rules, summarized here: * [[Atomicity]]: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back). * [[Database consistency|Consistency]]: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state. * [[Isolation]]: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions. * [[Durability (computer science)|Durability]]: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance. [[Concurrency control]] is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only [[serializability|serializable]], [[serializability#correctness - recoverability|recoverable]] schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions. ===Replication=== Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include: * Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves * Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas. * Multimaster: Two or more replicas sync each other via a transaction identifier. Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability. ===Security=== [[Database security]] denotes the system, processes, and procedures that protect a database from unintended activity. Security is usually enforced through '''access control''', '''auditing''', and '''encryption'''. * Access control ensures and restricts who can connect and what can be done to the database. * Auditing logs what action or change has been performed, when and by whom. * Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard. Enforcing security is one of the major tasks of the DBA. In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.<ref>[http://www.ico.gov.uk/ Information Commissioner's Office - ICO<!-- Bot generated title -->]</ref> ===Locking=== {{Expand-section|date=June 2008}} [[Lock (computer science)|Locking]] is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the '''same''' data. Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking. For most DBMS systems existing on the market, locks are generally '''shared''' or '''exclusive'''. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation. Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure) Default locking behavior is enforced by the '''isolation level''' of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user. At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back. Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See [http://publib.boulder.ibm.com/infocenter/rbhelp/v6r3/index.jsp?topic=/com.ibm.redbrick.doc6.3/wag/wag80.htm "Locking tables and databases" (section in some documentation / explanation from IBM)] for more detail.) ===Architecture=== Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like [[Google]]'s [[BigTable]], or bibliographic database(library catalogue) systems may use a [[Column-oriented DBMS]] architecture. Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation. Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases). Also there are other types of database which cannot be classified as relational databases ==Applications of databases== Databases are used in many applications, spanning virtually the entire range of [[computer software]]. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that [[application software]] can use a common [[Application Programming Interface]] to retrieve the information stored in a database. Two commonly used database APIs are [[Java Database Connectivity|JDBC]] and [[ODBC]]. For example suppliers database contains the data relating to suppliers such as; *supplier name *supplier code *supplier address It is often used by schools to teach students and grade them. ==Links to DBMS products== {{main|:Category:Database management systems}} <div style="-moz-column-count:3; column-count:3;"> *[[4th Dimension (Software)|4D]] *[[ADABAS]] *[[Alpha Five]] *[[Apache Derby]] (Java, also known as IBM Cloudscape and Sun Java DB) *[[BerkeleyDB]] *[[CouchDB]] *[[CSQL]] *[[Datawasp]] *[[Db4objects]] *[[dBase]] *[[FileMaker]] *[[Firebird (database server)]] *[[H2 (DBMS)|H2]] (Java) *[[Hsqldb]] (Java) *[[IBM DB2]] *[[Information Management System|IBM IMS (Information Management System)]] *[[IBM UniVerse]] *[[Informix]] *[[Ingres (database)|Ingres]] *[[Interbase]] *[[InterSystems Caché]] *[[MaxDB]] (formerly SapDB) *[[Microsoft Access]] *[[Microsoft SQL Server]] *[[Model 204]] *[[MySQL]] *[[Nomad software|Nomad]] *[[Objectivity/DB]] *[[ObjectStore]] *[[Virtuoso Universal Server|OpenLink Virtuoso]] *[[OpenOffice.org Base]] *[[Oracle Database]] *[[Paradox (database)]] *[[Polyhedra DBMS]] *[[PostgreSQL]] *[[Progress 4GL]] *[[RDM Embedded]] *[[ScimoreDB]] *[[Sedna (database)|Sedna]] *[[SQLite]] *[[Superbase database|Superbase]] *[[Sybase]] *[[Teradata]] *[[Vertica]] *[[Visual FoxPro]] </div> ==See also== * [[Comparison of relational database management systems]] * [[Comparison of database tools]] * [[Database-centric architecture]] * [[Database theory]] * [[Government database]] * [[Online database]] * [[Real time database]] ==References== ;Notes {{Reflist|2}} ;Bibliography {{refbegin}} * Connolly, Thomas, and Caroln Begg. ''Database Systems.'' New York: Harlow, 2002. * Date, C. J. ''An Introduction to Database Systems'', Eighth Edition, Addison Wesley, 2003. * Galindo, J., Urrutia, A., Piattini, M., ''Fuzzy Databases: Modeling, Design and Implementation'' ([[FSQL]] guide). Idea Group Publishing Hershey, USA, 2006. * Galindo, J., Ed. ''Handbook on Fuzzy Information Processing in Databases''. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008. * Gray, J. and Reuter, A. ''Transaction Processing: Concepts and Techniques'', 1st edition, Morgan Kaufmann Publishers, 1992. * Kroenke, David M. ''Database Processing: Fundamentals, Design, and Implementation'' (1997), Prentice-Hall, Inc., pages 130-144. * Kroenke, David M., and David J. Auer. ''Database Concepts.'' 3rd ed. New York: Prentice, 2007. * Lightstone, S., T. Teorey, and T. Nadeau, ''Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more'', Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6. * Shih, J. "[http://www.pcticorp.com/assets/docs/PQL2b.pdf Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?]", white paper, 2007. * Teorey, T.; Lightstone, S. and Nadeau, T. ''Database Modeling & Design: Logical Design'', 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5 * Tukey, John W. ''Exploratory Data Analysis.'' Reading, MA: Addison Wesley, 1977. {{refend}} ==External links== * [http://groups.google.com/group/comp.databases.theory comp.databases.theory] (Database Theory Discussion Group) * [http://www.lcc.uma.es/~ppgg/FSQL/ Web page about FSQL]: References and links about [[FSQL]] * [http://www.visolve.com/database/MySQL_HPUX_Perf.pdf Increase Database Performance] * [http://www.sqlset.com/ Database discussion forums] * [http://www.emdat.be The EM-DAT International Disaster Database] * [http://www.cedat.be The CE-DAT Complex Emergency Database] {{Databases}} [[Category:Databases]] [[Category:Database management systems]] [[Category:Database theory]] [[af:Databasis]] [[ar:قاعدة بيانات]] [[az:Verilənlər bazası]] [[be:База дадзеных]] [[be-x-old:База дадзеных]] [[bs:Baza podataka]] [[br:Stlennvon]] [[bg:База данни]] [[ca:Base de dades]] [[cs:Databáze]] [[da:Database]] [[de:Datenbank]] [[et:Andmebaas]] [[el:Βάση δεδομένων]] [[es:Base de datos]] [[eo:Datumbazo]] [[eu:Datu-base]] [[fa:دادگان]] [[fr:Base de données]] [[ga:Bunachar sonraí]] [[gl:Base de datos]] [[ko:데이터베이스]] [[hi:डेटाबेस]] [[hr:Baza podataka]] [[id:Basis data]] [[ia:Base de datos]] [[is:Gagnagrunnur]] [[it:Database]] [[he:בסיס נתונים]] [[ka:მონაცემთა ბაზა]] [[ku:Danegeh]] [[lv:Datu bāze]] [[lt:Duomenų bazė]] [[hu:Adatbázis]] [[ml:ഡാറ്റാബേസ്]] [[ms:Pangkalan data]] [[nl:Database]] [[ja:データベース]] [[no:Database]] [[uz:Ma'lumotlar Bazasi]] [[pl:Baza danych]] [[pt:Banco de dados]] [[ro:Bază de date]] [[ru:База данных]] [[sq:Baza e të dhënave]] [[si:දත්ත සමුදාය]] [[simple:Database]] [[sk:Databáza]] [[sl:Podatkovna baza]] [[sr:База података]] [[sh:Baza podataka]] [[fi:Tietokanta]] [[sv:Databas]] [[tl:Database]] [[ta:தரவுத்தளம்]] [[th:ฐานข้อมูล]] [[vi:Cơ sở dữ liệu]] [[tr:Veri tabanı]] [[uk:База даних]] [[zh:数据库]]

Content that violates any copyright will be deleted. Encyclopedic content must be verifiable. You irrevocably agree to release your contributions under the terms of the GFDL*.
Edit summary (Briefly describe the changes you have made):
Cancel | Editing help (opens in new window)
Do not copy text from other websites without a GFDL-compatible license. It will be deleted.

– — … ‘ “ ’ ” ° ″ ′ ≈ ≠ ≤ ≥ ± − × ÷ ← → · § Sign your posts on talk pages: [[Special:Contributions/71.225.217.168|71.225.217.168]] ([[User talk:71.225.217.168|talk]]) 21:48, 18 November 2008 (UTC) Cite your sources: <ref></ref>

Once you click the Save button, your changes will be visible immediately.

* For testing, please use the sandbox instead.

Please note:

* If you don't want your writing to be edited mercilessly or redistributed for profit by others, do not submit it.
* Only public domain resources can be copied without permission—this does not include most web pages or images.
* See our policies and guidelines for more information on editing.

1. ^ GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Templates used in this preview:

* Template:Ambox (view source) (protected)
* Template:Dablink (view source) (protected)
* Template:Databases (edit)
* Template:Expand-section (view source) (protected)
* Template:Main (view source) (protected)
* Template:Navbox (view source) (protected)
* Template:Refbegin (view source) (protected)
* Template:Refend (view source) (protected)
* Template:Reflist (view source) (protected)
* Template:Tnavbar (view source) (protected)
* Template:Unreferenced (view source) (protected)
* Template:· (view source) (protected)

This page is a member of 4 hidden categories:

* Category:All articles lacking sources
* Category:All articles to be expanded
* Category:Articles lacking sources from October 2008
* Category:Articles to be expanded since June 2008

Retrieved from "http://en.wikipedia.org/wiki/Database"
Categories: Database management systems | Databases | Database theory
Hidden categories: Articles lacking sources from October 2008 | All articles lacking sources | Articles to be expanded since June 2008 | All articles to be expanded
Views

* Article
* Discussion
* Edit this page
* History

Personal tools

* Log in / create account

Navigation


===Relational database management systems===
===Relational database management systems===

Revision as of 21:48, 18 November 2008

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.


Database management systems

Donate Now » [Expand] Support Wikipedia: a non-profit project Donate Now » Editing Database From Wikipedia, the free encyclopedia Jump to: navigation, search Preview

Remember that this is only a preview; your changes have not yet been saved! This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008) This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand is apornstar For a fuller discussion of DBMS software, see Database management system.

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.


Contents [hide]

   * 1 Database management systems
         o 1.1 Relational database management systems
         o 1.2 Post-relational database models
         o 1.3 Object database models
   * 2 DBMS internals
         o 2.1 Storage and physical database design
               + 2.1.1 Indexing
         o 2.2 Transactions and concurrency
         o 2.3 Replication
         o 2.4 Security
         o 2.5 Locking
         o 2.6 Architecture
   * 3 Applications of databases
   * 4 Links to DBMS products
   * 5 See also
   * 6 References
   * 7 External links

Database management systems

   Main article: Database management system

Relational database management systems

An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

   The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples) Therefore, there are no explicit pointers between related tables.

Post-relational database models

Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

DBMS internals

Storage and physical database design

   Main article: Database storage structures

Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:

   * Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
   * Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
   * Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
   * Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

   * Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
   * Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
   * Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

   * Access control ensures and restricts who can connect and what can be done to the database.
   * Auditing logs what action or change has been performed, when and by whom.
   * Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.[2]

Locking Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data.

Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.)

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.

Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).

Also there are other types of database which cannot be classified as relational databases

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

For example suppliers database contains the data relating to suppliers such as;

   * supplier name
   * supplier code
   * supplier address

It is often used by schools to teach students and grade them.

Links to DBMS products

   Main article: :Category:Database management systems
   * 4D
   * ADABAS
   * Alpha Five
   * Apache Derby (Java, also known as IBM Cloudscape and Sun Java DB)
   * BerkeleyDB
   * CouchDB
   * CSQL
   * Datawasp
   * Db4objects
   * dBase
   * FileMaker
   * Firebird (database server)
   * H2 (Java)
   * Hsqldb (Java)
   * IBM DB2
   * IBM IMS (Information Management System)
   * IBM UniVerse
   * Informix
   * Ingres
   * Interbase
   * InterSystems Caché
   * MaxDB (formerly SapDB)
   * Microsoft Access
   * Microsoft SQL Server
   * Model 204
   * MySQL
   * Nomad
   * Objectivity/DB
   * ObjectStore
   * OpenLink Virtuoso
   * OpenOffice.org Base
   * Oracle Database
   * Paradox (database)
   * Polyhedra DBMS
   * PostgreSQL
   * Progress 4GL
   * RDM Embedded
   * ScimoreDB
   * Sedna
   * SQLite
   * Superbase
   * Sybase
   * Teradata
   * Vertica
   * Visual FoxPro

See also

   * Comparison of relational database management systems
   * Comparison of database tools
   * Database-centric architecture
   * Database theory
   * Government database
   * Online database
   * Real time database

References

Notes

  1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
  2. ^ Information Commissioner's Office - ICO

Bibliography

   * Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
   * Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
   * Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
   * Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
   * Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
   * Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
   * Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
   * Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
   * Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
   * Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
   * Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

External links

   * comp.databases.theory (Database Theory Discussion Group)
   * Web page about FSQL: References and links about FSQL
   * Increase Database Performance
   * Database discussion forums
   * The EM-DAT International Disaster Database
   * The CE-DAT Complex Emergency Database

[hide] v • d • e Database management systems Database models · Database normalization · Database storage · Distributed DBMS · Referential integrity · Relational algebra · Relational calculus · Relational database · Relational DBMS · Relational model · Object-relational database · Transaction processing Concepts Database · ACID · CRUD · Null · Candidate key · Foreign key · Primary key · Superkey · Surrogate key Objects Trigger · View · Table · Cursor · Log · Transaction · Index · Stored procedure · Partition SQL Select · Insert · Update · Merge · Delete · Join · Union · Create · Drop · Begin work · Commit · Rollback · Truncate · Alter · XSQL Components Concurrency control · Data dictionary · JDBC · ODBC · Query language · Query optimizer · Query plan Database products: Object-oriented (comparison) · Relational (comparison) · Document-oriented Bold textItalic textInternal linkExternal link (remember http:// prefix)Level 2 headlineEmbedded fileFile linkMathematical formula (LaTeX)Ignore wiki formattingYour signature with timestampHorizontal line (use sparingly)RedirectStrikeLine breakSuperscriptSubscriptSmallInsert hidden CommentInsert a picture galleryInsert block of quoted textInsert a tableInsert a reference Anti-spam check. Do NOT fill this in!

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models). A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products. ==Database management systems==

===Relational database management systems=== An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations ( tuples) Therefore, there are no explicit pointers between related tables.

===Post-relational database models=== Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes. Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS. ===Object database models=== In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases. A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities. ==DBMS internals== ===Storage and physical database design===

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM. Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1] ====Indexing==== All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required. Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality. An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.) A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably. ===Transactions and concurrency=== In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here: * Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back). * Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state. * Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions. * Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance. Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions. ===Replication=== Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include: * Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves * Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas. * Multimaster: Two or more replicas sync each other via a transaction identifier. Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability. ===Security=== Database security denotes the system, processes, and procedures that protect a database from unintended activity. Security is usually enforced through access control, auditing, and encryption. * Access control ensures and restricts who can connect and what can be done to the database. * Auditing logs what action or change has been performed, when and by whom. * Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard. Enforcing security is one of the major tasks of the DBA. In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner. [2] ===Locking===

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data. Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking. For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation. Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure) Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user. At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back. Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.) ===Architecture=== Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture. Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation. Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases). Also there are other types of database which cannot be classified as relational databases ==Applications of databases== Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC. For example suppliers database contains the data relating to suppliers such as; *supplier name *supplier code *supplier address It is often used by schools to teach students and grade them. ==Links to DBMS products==

==See also== * Comparison of relational database management systems * Comparison of database tools * Database-centric architecture * Database theory * Government database * Online database * Real time database ==References== ;Notes

  1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
  2. ^ Information Commissioner's Office - ICO

;Bibliography

* Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002. * Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003. * Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation ( FSQL guide). Idea Group Publishing Hershey, USA, 2006. * Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008. * Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992. * Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144. * Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007. * Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6. * Shih, J. " Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007. * Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5 * Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

==External links== * comp.databases.theory (Database Theory Discussion Group) * Web page about FSQL: References and links about FSQL * Increase Database Performance * Database discussion forums * The EM-DAT International Disaster Database * The CE-DAT Complex Emergency Database

Content that violates any copyright will be deleted. Encyclopedic content must be verifiable. You irrevocably agree to release your contributions under the terms of the GFDL*. Edit summary (Briefly describe the changes you have made): Cancel | Editing help (opens in new window) Do not copy text from other websites without a GFDL-compatible license. It will be deleted.

 – — … ‘ “ ’ ” ° ″ ′ ≈ ≠ ≤ ≥ ± − × ÷ ← → · §   Sign your posts on talk pages: 
71.225.217.168 (
talk) 21:48, 18 November 2008 (UTC)   Cite your sources: Cite error: There are <ref> tags on this page without content in them (see the 
help page).

Once you click the Save button, your changes will be visible immediately.

   * For testing, please use the sandbox instead. 

Please note:

   * If you don't want your writing to be edited mercilessly or redistributed for profit by others, do not submit it.
   * Only public domain resources can be copied without permission—this does not include most web pages or images.
   * See our policies and guidelines for more information on editing. 
  1. ^ GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Templates used in this preview:

   * Template:Ambox (view source) (protected)
   * Template:Dablink (view source) (protected)
   * Template:Databases (edit)
   * Template:Expand-section (view source) (protected)
   * Template:Main (view source) (protected)
   * Template:Navbox (view source) (protected)
   * Template:Refbegin (view source) (protected)
   * Template:Refend (view source) (protected)
   * Template:Reflist (view source) (protected)
   * Template:Tnavbar (view source) (protected)
   * Template:Unreferenced (view source) (protected)
   * Template:· (view source) (protected)

This page is a member of 4 hidden categories:

   * Category:All articles lacking sources
   * Category:All articles to be expanded
   * Category:Articles lacking sources from October 2008
   * Category:Articles to be expanded since June 2008

Retrieved from " http://en.wikipedia.org/wiki/Database" Categories: Database management systems | Databases | Database theory Hidden categories: Articles lacking sources from October 2008 | All articles lacking sources | Articles to be expanded since June 2008 | All articles to be expanded Views

   * Article
   * Discussion
   * Edit this page
   * History

Personal tools

   * Log in / create account

Navigation

Relational database management systems

An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations ( tuples) Therefore, there are no explicit pointers between related tables.

Post-relational database models

Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

DBMS internals

Storage and physical database design

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:

  • Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
  • Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
  • Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
  • Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

  • Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
  • Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
  • Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

  • Access control ensures and restricts who can connect and what can be done to the database.
  • Auditing logs what action or change has been performed, when and by whom.
  • Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner. [2]

Locking

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data. Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.)

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.

Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).

Also there are other types of database which cannot be classified as relational databases

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

For example suppliers database contains the data relating to suppliers such as;

  • supplier name
  • supplier code
  • supplier address

It is often used by schools to teach students and grade them.

See also

References

Notes
  1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
  2. ^ Information Commissioner's Office - ICO
Bibliography
  • Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
  • Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
  • Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation ( FSQL guide). Idea Group Publishing Hershey, USA, 2006.
  • Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
  • Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
  • Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
  • Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
  • Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
  • Shih, J. " Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
  • Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
  • Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.
From Wikipedia, the free encyclopedia
Content deleted Content added
→‎See also: Deleted three "see also" references as irrelevant
No edit summary
Line 1: Line 1:
{{Unreferenced|date=October 2008}}
{{Unreferenced|date=October 2008}}


{{dablink|This article is principally about managing and structuring the collections of data held on computers. For a fuller discussion of DBMS software, see [[Database management system]].}}
{{dablink|This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand kills babies For a fuller discussion of DBMS software, see [[Database management system]].}}


A '''Computer Database''' is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a [[database model]]. The model in most common use today is the [[relational model]]. Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models).
A '''Computer Database''' is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a [[database model]]. The model in most common use today is the [[relational model]]. Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models).
Line 11: Line 11:
==Database management systems==
==Database management systems==
{{main|Database management system}}
{{main|Database management system}}
Donate Now »
[Expand]
Support Wikipedia: a non-profit project
Donate Now »
Editing Database
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Preview

Remember that this is only a preview; your changes have not yet been saved!
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008)
This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand is apornstar For a fuller discussion of DBMS software, see Database management system.

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.


Contents
[hide]

* 1 Database management systems
o 1.1 Relational database management systems
o 1.2 Post-relational database models
o 1.3 Object database models
* 2 DBMS internals
o 2.1 Storage and physical database design
+ 2.1.1 Indexing
o 2.2 Transactions and concurrency
o 2.3 Replication
o 2.4 Security
o 2.5 Locking
o 2.6 Architecture
* 3 Applications of databases
* 4 Links to DBMS products
* 5 See also
* 6 References
* 7 External links

Database management systems

Main article: Database management system

Relational database management systems

An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples) Therefore, there are no explicit pointers between related tables.

Post-relational database models

Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

DBMS internals

Storage and physical database design

Main article: Database storage structures

Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:

* Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
* Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
* Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
* Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

* Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
* Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
* Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

* Access control ensures and restricts who can connect and what can be done to the database.
* Auditing logs what action or change has been performed, when and by whom.
* Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.[2]

Locking
Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data.

Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.)

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.

Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).

Also there are other types of database which cannot be classified as relational databases

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

For example suppliers database contains the data relating to suppliers such as;

* supplier name
* supplier code
* supplier address

It is often used by schools to teach students and grade them.

Links to DBMS products

Main article: :Category:Database management systems

* 4D
* ADABAS
* Alpha Five
* Apache Derby (Java, also known as IBM Cloudscape and Sun Java DB)
* BerkeleyDB
* CouchDB
* CSQL
* Datawasp
* Db4objects
* dBase
* FileMaker
* Firebird (database server)
* H2 (Java)
* Hsqldb (Java)
* IBM DB2
* IBM IMS (Information Management System)
* IBM UniVerse
* Informix
* Ingres
* Interbase
* InterSystems Caché
* MaxDB (formerly SapDB)
* Microsoft Access
* Microsoft SQL Server
* Model 204
* MySQL
* Nomad
* Objectivity/DB
* ObjectStore
* OpenLink Virtuoso
* OpenOffice.org Base
* Oracle Database
* Paradox (database)
* Polyhedra DBMS
* PostgreSQL
* Progress 4GL
* RDM Embedded
* ScimoreDB
* Sedna
* SQLite
* Superbase
* Sybase
* Teradata
* Vertica
* Visual FoxPro

See also

* Comparison of relational database management systems
* Comparison of database tools
* Database-centric architecture
* Database theory
* Government database
* Online database
* Real time database

References

Notes

1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
2. ^ Information Commissioner's Office - ICO

Bibliography

* Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
* Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
* Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
* Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
* Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
* Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
* Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
* Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
* Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
* Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
* Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

External links

* comp.databases.theory (Database Theory Discussion Group)
* Web page about FSQL: References and links about FSQL
* Increase Database Performance
* Database discussion forums
* The EM-DAT International Disaster Database
* The CE-DAT Complex Emergency Database

[hide]
v • d • e
Database management systems
Database models · Database normalization · Database storage · Distributed DBMS · Referential integrity · Relational algebra · Relational calculus · Relational database · Relational DBMS · Relational model · Object-relational database · Transaction processing
Concepts
Database · ACID · CRUD · Null · Candidate key · Foreign key · Primary key · Superkey · Surrogate key
Objects
Trigger · View · Table · Cursor · Log · Transaction · Index · Stored procedure · Partition
SQL
Select · Insert · Update · Merge · Delete · Join · Union · Create · Drop · Begin work · Commit · Rollback · Truncate · Alter · XSQL
Components
Concurrency control · Data dictionary · JDBC · ODBC · Query language · Query optimizer · Query plan
Database products: Object-oriented (comparison) · Relational (comparison) · Document-oriented
Bold textItalic textInternal linkExternal link (remember http:// prefix)Level 2 headlineEmbedded fileFile linkMathematical formula (LaTeX)Ignore wiki formattingYour signature with timestampHorizontal line (use sparingly)RedirectStrikeLine breakSuperscriptSubscriptSmallInsert hidden CommentInsert a picture galleryInsert block of quoted textInsert a tableInsert a reference
Anti-spam check. Do NOT fill this in!
{{Unreferenced|date=October 2008}} {{dablink|This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand is apornstar For a fuller discussion of DBMS software, see [[Database management system]].}} A '''Computer Database''' is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a [[database model]]. The model in most common use today is the [[relational model]]. Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models). A '''[[computer]] [[database]]''' relies upon [[software]] to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the [[database model]] that they support. The model tends to determine the [[query languages]] that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products. ==Database management systems== {{main|Database management system}} ===Relational database management systems=== An RDBMS implements the features of the relational model outlined above. In this context, [[Christopher J. Date|Date]]'s '''Information Principle''' states: <blockquote>The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations ([[tuple]]s) Therefore, there are no explicit pointers between related tables.</blockquote> ===Post-relational database models=== Several products have been identified as [[post-relational]] because the data model incorporates [[relations]] but is not constrained by the Information Principle, requiring that all information is represented by [[data values]] in relations. Products using a post-relational data model typically employ a model that actually pre-dates the [[relational model]]. These might be identified as a [[directed graph]] with [[tree data structure|trees]] on the [[data structure|nodes]]. Examples of models that could be classified as post-relational are [[Pick operating system|PICK]] aka [[Multidimensional database|MultiValue]], and [[MUMPS]]. ===Object database models=== In recent years, the [[object-oriented]] paradigm has been applied to database technology, creating a new programming model known as [[object database]]s. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same [[type system]] as the application program. This aims to avoid the overhead (sometimes referred to as the ''[[Object-Relational impedance mismatch|impedance mismatch]]'') of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as [[encapsulation]] and [[polymorphism (computer science)|polymorphism]], into the world of databases. A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program [[Persistence (computer science)|persistent]]. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities. ==DBMS internals== ===Storage and physical database design=== {{main|Database storage structures}} {{Expand-section|date=June 2008}} Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered [[flat file database|flat files]], [[ISAM]], [[heap (data structure)|heaps]], [[hash table|hash buckets]] or [[B+ tree]]s. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM. Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. <ref name="Physical Database Design">S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896 </ref> ====Indexing==== All of these databases can take advantage of [[Index (database)|indexing]] to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as [[B-tree]]s, [[hash table|hash]]es, and [[linked lists]]). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required. Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an [[SQL]] statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with [[join (SQL)|joins]] are [[nested loop join]], [[sort-merge join]] and [[hash join]]. Which of these is chosen depends on whether an index exists, what type it is, and its [[Cardinality (SQL statements)|cardinality]]. An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.) A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably. ===Transactions and concurrency=== In addition to their data model, most practical databases ("transactional databases") attempt to enforce a [[database transaction]] . Ideally, the database software should enforce the [[ACID]] rules, summarized here: * [[Atomicity]]: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back). * [[Database consistency|Consistency]]: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state. * [[Isolation]]: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions. * [[Durability (computer science)|Durability]]: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance. [[Concurrency control]] is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only [[serializability|serializable]], [[serializability#correctness - recoverability|recoverable]] schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions. ===Replication=== Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include: * Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves * Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas. * Multimaster: Two or more replicas sync each other via a transaction identifier. Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability. ===Security=== [[Database security]] denotes the system, processes, and procedures that protect a database from unintended activity. Security is usually enforced through '''access control''', '''auditing''', and '''encryption'''. * Access control ensures and restricts who can connect and what can be done to the database. * Auditing logs what action or change has been performed, when and by whom. * Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard. Enforcing security is one of the major tasks of the DBA. In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.<ref>[http://www.ico.gov.uk/ Information Commissioner's Office - ICO<!-- Bot generated title -->]</ref> ===Locking=== {{Expand-section|date=June 2008}} [[Lock (computer science)|Locking]] is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the '''same''' data. Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking. For most DBMS systems existing on the market, locks are generally '''shared''' or '''exclusive'''. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation. Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure) Default locking behavior is enforced by the '''isolation level''' of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user. At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back. Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See [http://publib.boulder.ibm.com/infocenter/rbhelp/v6r3/index.jsp?topic=/com.ibm.redbrick.doc6.3/wag/wag80.htm "Locking tables and databases" (section in some documentation / explanation from IBM)] for more detail.) ===Architecture=== Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like [[Google]]'s [[BigTable]], or bibliographic database(library catalogue) systems may use a [[Column-oriented DBMS]] architecture. Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation. Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases). Also there are other types of database which cannot be classified as relational databases ==Applications of databases== Databases are used in many applications, spanning virtually the entire range of [[computer software]]. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that [[application software]] can use a common [[Application Programming Interface]] to retrieve the information stored in a database. Two commonly used database APIs are [[Java Database Connectivity|JDBC]] and [[ODBC]]. For example suppliers database contains the data relating to suppliers such as; *supplier name *supplier code *supplier address It is often used by schools to teach students and grade them. ==Links to DBMS products== {{main|:Category:Database management systems}} <div style="-moz-column-count:3; column-count:3;"> *[[4th Dimension (Software)|4D]] *[[ADABAS]] *[[Alpha Five]] *[[Apache Derby]] (Java, also known as IBM Cloudscape and Sun Java DB) *[[BerkeleyDB]] *[[CouchDB]] *[[CSQL]] *[[Datawasp]] *[[Db4objects]] *[[dBase]] *[[FileMaker]] *[[Firebird (database server)]] *[[H2 (DBMS)|H2]] (Java) *[[Hsqldb]] (Java) *[[IBM DB2]] *[[Information Management System|IBM IMS (Information Management System)]] *[[IBM UniVerse]] *[[Informix]] *[[Ingres (database)|Ingres]] *[[Interbase]] *[[InterSystems Caché]] *[[MaxDB]] (formerly SapDB) *[[Microsoft Access]] *[[Microsoft SQL Server]] *[[Model 204]] *[[MySQL]] *[[Nomad software|Nomad]] *[[Objectivity/DB]] *[[ObjectStore]] *[[Virtuoso Universal Server|OpenLink Virtuoso]] *[[OpenOffice.org Base]] *[[Oracle Database]] *[[Paradox (database)]] *[[Polyhedra DBMS]] *[[PostgreSQL]] *[[Progress 4GL]] *[[RDM Embedded]] *[[ScimoreDB]] *[[Sedna (database)|Sedna]] *[[SQLite]] *[[Superbase database|Superbase]] *[[Sybase]] *[[Teradata]] *[[Vertica]] *[[Visual FoxPro]] </div> ==See also== * [[Comparison of relational database management systems]] * [[Comparison of database tools]] * [[Database-centric architecture]] * [[Database theory]] * [[Government database]] * [[Online database]] * [[Real time database]] ==References== ;Notes {{Reflist|2}} ;Bibliography {{refbegin}} * Connolly, Thomas, and Caroln Begg. ''Database Systems.'' New York: Harlow, 2002. * Date, C. J. ''An Introduction to Database Systems'', Eighth Edition, Addison Wesley, 2003. * Galindo, J., Urrutia, A., Piattini, M., ''Fuzzy Databases: Modeling, Design and Implementation'' ([[FSQL]] guide). Idea Group Publishing Hershey, USA, 2006. * Galindo, J., Ed. ''Handbook on Fuzzy Information Processing in Databases''. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008. * Gray, J. and Reuter, A. ''Transaction Processing: Concepts and Techniques'', 1st edition, Morgan Kaufmann Publishers, 1992. * Kroenke, David M. ''Database Processing: Fundamentals, Design, and Implementation'' (1997), Prentice-Hall, Inc., pages 130-144. * Kroenke, David M., and David J. Auer. ''Database Concepts.'' 3rd ed. New York: Prentice, 2007. * Lightstone, S., T. Teorey, and T. Nadeau, ''Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more'', Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6. * Shih, J. "[http://www.pcticorp.com/assets/docs/PQL2b.pdf Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?]", white paper, 2007. * Teorey, T.; Lightstone, S. and Nadeau, T. ''Database Modeling & Design: Logical Design'', 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5 * Tukey, John W. ''Exploratory Data Analysis.'' Reading, MA: Addison Wesley, 1977. {{refend}} ==External links== * [http://groups.google.com/group/comp.databases.theory comp.databases.theory] (Database Theory Discussion Group) * [http://www.lcc.uma.es/~ppgg/FSQL/ Web page about FSQL]: References and links about [[FSQL]] * [http://www.visolve.com/database/MySQL_HPUX_Perf.pdf Increase Database Performance] * [http://www.sqlset.com/ Database discussion forums] * [http://www.emdat.be The EM-DAT International Disaster Database] * [http://www.cedat.be The CE-DAT Complex Emergency Database] {{Databases}} [[Category:Databases]] [[Category:Database management systems]] [[Category:Database theory]] [[af:Databasis]] [[ar:قاعدة بيانات]] [[az:Verilənlər bazası]] [[be:База дадзеных]] [[be-x-old:База дадзеных]] [[bs:Baza podataka]] [[br:Stlennvon]] [[bg:База данни]] [[ca:Base de dades]] [[cs:Databáze]] [[da:Database]] [[de:Datenbank]] [[et:Andmebaas]] [[el:Βάση δεδομένων]] [[es:Base de datos]] [[eo:Datumbazo]] [[eu:Datu-base]] [[fa:دادگان]] [[fr:Base de données]] [[ga:Bunachar sonraí]] [[gl:Base de datos]] [[ko:데이터베이스]] [[hi:डेटाबेस]] [[hr:Baza podataka]] [[id:Basis data]] [[ia:Base de datos]] [[is:Gagnagrunnur]] [[it:Database]] [[he:בסיס נתונים]] [[ka:მონაცემთა ბაზა]] [[ku:Danegeh]] [[lv:Datu bāze]] [[lt:Duomenų bazė]] [[hu:Adatbázis]] [[ml:ഡാറ്റാബേസ്]] [[ms:Pangkalan data]] [[nl:Database]] [[ja:データベース]] [[no:Database]] [[uz:Ma'lumotlar Bazasi]] [[pl:Baza danych]] [[pt:Banco de dados]] [[ro:Bază de date]] [[ru:База данных]] [[sq:Baza e të dhënave]] [[si:දත්ත සමුදාය]] [[simple:Database]] [[sk:Databáza]] [[sl:Podatkovna baza]] [[sr:База података]] [[sh:Baza podataka]] [[fi:Tietokanta]] [[sv:Databas]] [[tl:Database]] [[ta:தரவுத்தளம்]] [[th:ฐานข้อมูล]] [[vi:Cơ sở dữ liệu]] [[tr:Veri tabanı]] [[uk:База даних]] [[zh:数据库]]

Content that violates any copyright will be deleted. Encyclopedic content must be verifiable. You irrevocably agree to release your contributions under the terms of the GFDL*.
Edit summary (Briefly describe the changes you have made):
Cancel | Editing help (opens in new window)
Do not copy text from other websites without a GFDL-compatible license. It will be deleted.

– — … ‘ “ ’ ” ° ″ ′ ≈ ≠ ≤ ≥ ± − × ÷ ← → · § Sign your posts on talk pages: [[Special:Contributions/71.225.217.168|71.225.217.168]] ([[User talk:71.225.217.168|talk]]) 21:48, 18 November 2008 (UTC) Cite your sources: <ref></ref>

Once you click the Save button, your changes will be visible immediately.

* For testing, please use the sandbox instead.

Please note:

* If you don't want your writing to be edited mercilessly or redistributed for profit by others, do not submit it.
* Only public domain resources can be copied without permission—this does not include most web pages or images.
* See our policies and guidelines for more information on editing.

1. ^ GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Templates used in this preview:

* Template:Ambox (view source) (protected)
* Template:Dablink (view source) (protected)
* Template:Databases (edit)
* Template:Expand-section (view source) (protected)
* Template:Main (view source) (protected)
* Template:Navbox (view source) (protected)
* Template:Refbegin (view source) (protected)
* Template:Refend (view source) (protected)
* Template:Reflist (view source) (protected)
* Template:Tnavbar (view source) (protected)
* Template:Unreferenced (view source) (protected)
* Template:· (view source) (protected)

This page is a member of 4 hidden categories:

* Category:All articles lacking sources
* Category:All articles to be expanded
* Category:Articles lacking sources from October 2008
* Category:Articles to be expanded since June 2008

Retrieved from "http://en.wikipedia.org/wiki/Database"
Categories: Database management systems | Databases | Database theory
Hidden categories: Articles lacking sources from October 2008 | All articles lacking sources | Articles to be expanded since June 2008 | All articles to be expanded
Views

* Article
* Discussion
* Edit this page
* History

Personal tools

* Log in / create account

Navigation


===Relational database management systems===
===Relational database management systems===

Revision as of 21:48, 18 November 2008

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.


Database management systems

Donate Now » [Expand] Support Wikipedia: a non-profit project Donate Now » Editing Database From Wikipedia, the free encyclopedia Jump to: navigation, search Preview

Remember that this is only a preview; your changes have not yet been saved! This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008) This article is principally about managing and structuring the collections of data held on computers.MRS Wiealand is apornstar For a fuller discussion of DBMS software, see Database management system.

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.


Contents [hide]

   * 1 Database management systems
         o 1.1 Relational database management systems
         o 1.2 Post-relational database models
         o 1.3 Object database models
   * 2 DBMS internals
         o 2.1 Storage and physical database design
               + 2.1.1 Indexing
         o 2.2 Transactions and concurrency
         o 2.3 Replication
         o 2.4 Security
         o 2.5 Locking
         o 2.6 Architecture
   * 3 Applications of databases
   * 4 Links to DBMS products
   * 5 See also
   * 6 References
   * 7 External links

Database management systems

   Main article: Database management system

Relational database management systems

An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

   The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples) Therefore, there are no explicit pointers between related tables.

Post-relational database models

Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

DBMS internals

Storage and physical database design

   Main article: Database storage structures

Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:

   * Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
   * Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
   * Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
   * Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

   * Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
   * Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
   * Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

   * Access control ensures and restricts who can connect and what can be done to the database.
   * Auditing logs what action or change has been performed, when and by whom.
   * Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.[2]

Locking Please help improve this section by expanding it. Further information might be found on the talk page or at requests for expansion. (June 2008)

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data.

Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.)

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.

Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).

Also there are other types of database which cannot be classified as relational databases

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

For example suppliers database contains the data relating to suppliers such as;

   * supplier name
   * supplier code
   * supplier address

It is often used by schools to teach students and grade them.

Links to DBMS products

   Main article: :Category:Database management systems
   * 4D
   * ADABAS
   * Alpha Five
   * Apache Derby (Java, also known as IBM Cloudscape and Sun Java DB)
   * BerkeleyDB
   * CouchDB
   * CSQL
   * Datawasp
   * Db4objects
   * dBase
   * FileMaker
   * Firebird (database server)
   * H2 (Java)
   * Hsqldb (Java)
   * IBM DB2
   * IBM IMS (Information Management System)
   * IBM UniVerse
   * Informix
   * Ingres
   * Interbase
   * InterSystems Caché
   * MaxDB (formerly SapDB)
   * Microsoft Access
   * Microsoft SQL Server
   * Model 204
   * MySQL
   * Nomad
   * Objectivity/DB
   * ObjectStore
   * OpenLink Virtuoso
   * OpenOffice.org Base
   * Oracle Database
   * Paradox (database)
   * Polyhedra DBMS
   * PostgreSQL
   * Progress 4GL
   * RDM Embedded
   * ScimoreDB
   * Sedna
   * SQLite
   * Superbase
   * Sybase
   * Teradata
   * Vertica
   * Visual FoxPro

See also

   * Comparison of relational database management systems
   * Comparison of database tools
   * Database-centric architecture
   * Database theory
   * Government database
   * Online database
   * Real time database

References

Notes

  1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
  2. ^ Information Commissioner's Office - ICO

Bibliography

   * Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
   * Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
   * Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
   * Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
   * Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
   * Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
   * Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
   * Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
   * Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
   * Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
   * Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

External links

   * comp.databases.theory (Database Theory Discussion Group)
   * Web page about FSQL: References and links about FSQL
   * Increase Database Performance
   * Database discussion forums
   * The EM-DAT International Disaster Database
   * The CE-DAT Complex Emergency Database

[hide] v • d • e Database management systems Database models · Database normalization · Database storage · Distributed DBMS · Referential integrity · Relational algebra · Relational calculus · Relational database · Relational DBMS · Relational model · Object-relational database · Transaction processing Concepts Database · ACID · CRUD · Null · Candidate key · Foreign key · Primary key · Superkey · Surrogate key Objects Trigger · View · Table · Cursor · Log · Transaction · Index · Stored procedure · Partition SQL Select · Insert · Update · Merge · Delete · Join · Union · Create · Drop · Begin work · Commit · Rollback · Truncate · Alter · XSQL Components Concurrency control · Data dictionary · JDBC · ODBC · Query language · Query optimizer · Query plan Database products: Object-oriented (comparison) · Relational (comparison) · Document-oriented Bold textItalic textInternal linkExternal link (remember http:// prefix)Level 2 headlineEmbedded fileFile linkMathematical formula (LaTeX)Ignore wiki formattingYour signature with timestampHorizontal line (use sparingly)RedirectStrikeLine breakSuperscriptSubscriptSmallInsert hidden CommentInsert a picture galleryInsert block of quoted textInsert a tableInsert a reference Anti-spam check. Do NOT fill this in!

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models). A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products. ==Database management systems==

===Relational database management systems=== An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations ( tuples) Therefore, there are no explicit pointers between related tables.

===Post-relational database models=== Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes. Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS. ===Object database models=== In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases. A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities. ==DBMS internals== ===Storage and physical database design===

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM. Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1] ====Indexing==== All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required. Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality. An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.) A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably. ===Transactions and concurrency=== In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here: * Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back). * Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state. * Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions. * Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance. Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions. ===Replication=== Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include: * Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves * Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas. * Multimaster: Two or more replicas sync each other via a transaction identifier. Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability. ===Security=== Database security denotes the system, processes, and procedures that protect a database from unintended activity. Security is usually enforced through access control, auditing, and encryption. * Access control ensures and restricts who can connect and what can be done to the database. * Auditing logs what action or change has been performed, when and by whom. * Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard. Enforcing security is one of the major tasks of the DBA. In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner. [2] ===Locking===

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data. Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking. For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation. Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure) Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user. At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back. Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.) ===Architecture=== Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture. Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation. Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases). Also there are other types of database which cannot be classified as relational databases ==Applications of databases== Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC. For example suppliers database contains the data relating to suppliers such as; *supplier name *supplier code *supplier address It is often used by schools to teach students and grade them. ==Links to DBMS products==

==See also== * Comparison of relational database management systems * Comparison of database tools * Database-centric architecture * Database theory * Government database * Online database * Real time database ==References== ;Notes

  1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
  2. ^ Information Commissioner's Office - ICO

;Bibliography

* Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002. * Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003. * Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation ( FSQL guide). Idea Group Publishing Hershey, USA, 2006. * Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008. * Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992. * Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144. * Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007. * Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6. * Shih, J. " Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007. * Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5 * Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

==External links== * comp.databases.theory (Database Theory Discussion Group) * Web page about FSQL: References and links about FSQL * Increase Database Performance * Database discussion forums * The EM-DAT International Disaster Database * The CE-DAT Complex Emergency Database

Content that violates any copyright will be deleted. Encyclopedic content must be verifiable. You irrevocably agree to release your contributions under the terms of the GFDL*. Edit summary (Briefly describe the changes you have made): Cancel | Editing help (opens in new window) Do not copy text from other websites without a GFDL-compatible license. It will be deleted.

 – — … ‘ “ ’ ” ° ″ ′ ≈ ≠ ≤ ≥ ± − × ÷ ← → · §   Sign your posts on talk pages: 
71.225.217.168 (
talk) 21:48, 18 November 2008 (UTC)   Cite your sources: Cite error: There are <ref> tags on this page without content in them (see the 
help page).

Once you click the Save button, your changes will be visible immediately.

   * For testing, please use the sandbox instead. 

Please note:

   * If you don't want your writing to be edited mercilessly or redistributed for profit by others, do not submit it.
   * Only public domain resources can be copied without permission—this does not include most web pages or images.
   * See our policies and guidelines for more information on editing. 
  1. ^ GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Templates used in this preview:

   * Template:Ambox (view source) (protected)
   * Template:Dablink (view source) (protected)
   * Template:Databases (edit)
   * Template:Expand-section (view source) (protected)
   * Template:Main (view source) (protected)
   * Template:Navbox (view source) (protected)
   * Template:Refbegin (view source) (protected)
   * Template:Refend (view source) (protected)
   * Template:Reflist (view source) (protected)
   * Template:Tnavbar (view source) (protected)
   * Template:Unreferenced (view source) (protected)
   * Template:· (view source) (protected)

This page is a member of 4 hidden categories:

   * Category:All articles lacking sources
   * Category:All articles to be expanded
   * Category:Articles lacking sources from October 2008
   * Category:Articles to be expanded since June 2008

Retrieved from " http://en.wikipedia.org/wiki/Database" Categories: Database management systems | Databases | Database theory Hidden categories: Articles lacking sources from October 2008 | All articles lacking sources | Articles to be expanded since June 2008 | All articles to be expanded Views

   * Article
   * Discussion
   * Edit this page
   * History

Personal tools

   * Log in / create account

Navigation

Relational database management systems

An RDBMS implements the features of the relational model outlined above. In this context, Date's Information Principle states:

The entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations ( tuples) Therefore, there are no explicit pointers between related tables.

Post-relational database models

Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

DBMS internals

Storage and physical database design

Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . Ideally, the database software should enforce the ACID rules, summarized here:

  • Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
  • Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
  • Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
  • Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

  • Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
  • Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
  • Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

  • Access control ensures and restricts who can connect and what can be done to the database.
  • Auditing logs what action or change has been performed, when and by whom.
  • Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner. [2]

Locking

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data. Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only. A database can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.)

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.

Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).

Also there are other types of database which cannot be classified as relational databases

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

For example suppliers database contains the data relating to suppliers such as;

  • supplier name
  • supplier code
  • supplier address

It is often used by schools to teach students and grade them.

See also

References

Notes
  1. ^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
  2. ^ Information Commissioner's Office - ICO
Bibliography
  • Connolly, Thomas, and Caroln Begg. Database Systems. New York: Harlow, 2002.
  • Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
  • Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation ( FSQL guide). Idea Group Publishing Hershey, USA, 2006.
  • Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
  • Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
  • Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
  • Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
  • Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
  • Shih, J. " Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
  • Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
  • Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook