NOTE: This is a draft for a revised version of the article at Grid Computing. It is a work in progress and will not be implemented without consensus approval on Talk:Grid computing.
This work has been started by User:Ora but he is not trying to 'own' the process :). If others wish to participate in this please leave a comment below this note so we can know who each other are. As background I work in the grid field, though am not technical and would appreciate help in these areas. In the spirit of full disclosure I work for the EGEE project, I encourage other contributors to likewise indicate who they work for to avoid perceptions of hidden bias.
ALo, I really want at least the opening of the article to make sense to non-technical readers. I think the current one is in many places needlessly complex.
Copied from main talk page, written by Bovineone 00:07, 15 March 2006 (UTC)
Reworked version by ora 11:31, 23 May 2006 (UTC) (in progress). Minor rework and a first stab at a new lead Andreww 17:58, 24 May 2006 (UTC)
Grid computing is an emerging methodology for computing where distributed computers, under the control of dispersed organisations (or parts of a large organisation), are collected together to provide a more powerful resource than would be available to the separate organizations. Grids can provide for a higher throughput of computations by running tasks across many networked computers. They can also provide facilities for the storage and analysis of very large data sets, enable communication between users and provide for better resource usage between (for example)supercomputer centers. The fact the the creation of a grid cuts accross administrative domains can lead to the creation of a distributed team focused on solving the problems the grid was created to solve. Such a team is known as a virtual organization. Some current research into grid tecnology is focused on inter-grid communications. Ultimatly, this may lead to the creation of "The Grid" in the same way that internetworking research lead to the creation of "The Internet".
Second paragraph - Where does the concept of a grid come from? How is it used today (not just in the sense of the way described above). One or two good examples (if possible examples that have there own page to link to)...
Because of the rapid and contuing growth of the use of grid tecnologies in academia, research and industry it is not easy to define exactly what is ment by the term "grid computing". Indeed, there may be as many definitions as grid user communities. Ian Foster summed up the problem in his article "What is the Grid? A Three Point Checklist"
Foster goes on to argue that grids must posess three features: they are built of computing resources are not administered centrally, they use open standards, and they yield a non trivial quality of service. CERN use a pragmatic definition where a grid is "a service for sharing computer power and data storage capacity over the Internet"[add ref]. This second definition leaves much lattitude regarding exactly what is meant by "grid" and there has been much use of the term as a marketing tool. [some examples...]
Grid computing is an emerging computing model that provides the ability to perform higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture that is able to distribute process execution across a parallel infrastructure. Grids use the resources of many separate computers connected by a network (usually the Internet) to solve large-scale computation problems. Grids provide the ability to perform computations on large data sets, by breaking them down into many smaller ones, or provide the ability to perform many more computations at once than would be possible on a single computer, by modeling a parallel division of labour between processes.Now a days resource allocation in grid is done in accordance with SLA(service level agreement).
Like the Internet, the Grid Computing concept has evolved from the computational needs of " big science". The Internet was developed to meet the need for a common communication medium between large, federally funded computing centers. These communication links led to resource and information sharing between these centers and eventually to provide access to them for additional users. Ad hoc resource sharing 'procedures' among these original groups pointed the way toward standardization of the protocols needed to communicate between any administrative domain. The current Grid technology can be viewed as an extension or application of this framework to create a more generic resource sharing context.
The non-profit SETI@home project is one of the most widely-known scientific causes to create a simple Grid computing project by using CPU scavenging. (Grid purists point out that Seti@home is really a distributed computing application as it does not make use of almost any Grid concepts.) Seti@home was not the first to pioneer the technique, cycle-stealing in a local computer network dates back to at least the 1970's and other non-profit projects like distributed.net preceded it, Seti@home's popularity means it has been followed by many others covering tasks such as protein folding, research into drugs for cancer, mathematical problems and climate models. Most of these projects work by running as a screensaver or background program on users' personal computers, processing small pieces of the overall data while the computer is otherwise idle or lightly used. Many such projects have achieved results that would otherwise have been much delayed or taken prohibitive investment.
While proprietary Grid computing has been used in different companies and labs for years, one of the first proprietary commercial Grid offerings was launched by Entropia in 1997. A significant difference between Grids and proprietary Grid-like projects such as SETI@home is that Grids allow for jobs to be moved to any node on the Grid and executed. For example, SETI@home's screensaver contains both code to process radio telescope data and code to handle retrieving work and returning results. The two bodies of code are intertwined into a single program. In a Grid, only the code required for retrieving work and returning results persists on the nodes. Code required to perform the distributed work is sent to the nodes separately. In this way, the nodes of a Grid can be easily reprogrammed.
Parabon Computation was awarded a patent on this business model in 2002. [1]
Grid computing offers a model for solving massive computational problems by making use of the unused resources (CPU cycles and/or disk storage) of large numbers of disparate computers, often desktop computers, treated as a virtual cluster embedded in a distributed telecommunications infrastructure. Grid computing's focus on the ability to support computation across administrative domains sets it apart from traditional computer clusters or traditional distributed computing.
Grids offer a way to solve Grand Challenge problems like protein folding, financial modelling, earthquake simulation, and climate/ weather modelling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility bureau for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water.
Grid computing has the design goal of solving problems too big for any single supercomputer, whilst retaining the flexibility to work on multiple smaller problems. Thus Grid computing provides a multi-user environment. Its secondary aims are better exploitation of available computing power and catering for the intermittent demands of large computational exercises.
This approach implies the use of secure authorization techniques to allow remote users to control computing resources.
Grid computing involves sharing heterogeneous resources (based on different platforms, hardware/software architectures, and computer languages), located in different places belonging to different administrative domains over a network using open standards. In short, it involves virtualizing computing resources.
Grid computing is often confused with cluster computing. The key difference is that a cluster is a single set of nodes sitting in one location, while a Grid is composed of many clusters and other kinds of resources (e.g. networks, storage factilities).
Functionally, one can classify Grids into several types:
The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power Grid.
Today there are many definitions of Grid computing:
Grids can be categorized with a three stage model of departmental Grids, enterprise Grids and global Grids. These correspond to a firm initially utilising resources within a single group i.e. an engineering department connecting desktop machines, clusters and equipment. This progresses to enterprise Grids where non-technical staff's computing resources can be used for cycle-stealing and storage. A global Grid is a connection of enterprise and departmental Grids which can be used in a commercial or collaborative manner.
Grid computing is a subset of distributed computing.
Grid computing reflects a conceptual framework rather than a physical resource. The Grid approach is utilized to provision a computational task with administratively-distant resources. The focus of Grid technology is associated with the issues and requirements of flexible computational provisioning beyond the local (home) administrative domain.
A Grid environment is created to address resource needs. The use of that resource(s) (eg. CPU cycles, disk storage, data, software programs, peripherals) is usually characterized by its availability outside of the context of the local administrative domain. This 'external provisioning' approach entails creating a new administrative domain referred to as a Virtual Organization (VO) with a distinct and separate set of administrative policies (home administration policies plus external resource administrative policies equals the VO [aka your Grid] administrative policies). The context for a Grid 'job execution' is distinguished by the requirements created when operating outside of the home administrative context. Grid technology (aka. middleware) is employed to facilitate formalizing and complying with the Grid context associated with your application execution.
One characteristic that currently distinguishes Grid computing from distributed computing is the abstraction of a 'distributed resource' into a Grid resource. One result of abstraction is that it allows resource substitution to be more easily accomplished. Some of the overhead associated with this flexibility is reflected in the middleware layer and the temporal latency associated with the access of a Grid (or any distributed) resource. This overhead, especially the temporal latency, must be evaluated in terms of the impact on computational performance when a Grid resource is employed.
Web based resources or Web based resource access is an appealing approach to Grid resource provisioning. A recent GGF Grid middleware evolutionary development 're-factored' the architecture/design of the Grid resource concept to reflect using the W3C WSDL (Web Service Description Language) to implement the concept of a WS-Resource. The stateless nature of the Web, while enhancing the ability to scale, can be a concern for applications that migrate from a stateful resource access context to the Web-based stateless resource access context. The GGF WS-Resource concept includes discussions on accommodating the statelessness associated with Web resources access.
The conceptual framework and ancillary infrastructure are evolving at a fast pace and include international participation. The business sector is actively involved in commercialization of the Grid framework. The ' big science' sector is actively addressing the development environment and resource (aka performance) monitoring aspects. Activity is also observed in providing Grid-enabled versions of HPC (High Performance Computing) tools. Activity in the domains of 'little science' appears to be scant at this time. The treatment in the GGF documentation series reflects the HPC roots of the Grid concept framework; this bias should not be interpreted as a restriction in the application of the Grid conceptual framework in its application to other research domains or other computational contexts.
Substantial experience is being built through the operation of various Grids, most notable of them being the EGEE infrastructure supporting LCG, the LHC Computing Grid [1]. LCG is driven by CERN's need to handle a huge amount of data, produced at a rate of almost a gigabyte per second (10 petabytes per year), a history not unlike that of the production NorduGrid. A list of active sites participating within LCG can be found online [2] as can real time monitoring of the EGEE infrastructure [3]. The relevant software and documentation is also publicly accessible [4].
The Global Grid Forum (GGF) has the purpose of defining specifications for Grid computing. GGF is a collaboration between industry and academia with significant support from both.
The Globus Alliance implements some of the standards developed at the GGF through the Globus Toolkit (Grid middleware). As a middleware component, it provides a standard platform for services to build upon, but Grid computing also needs other components, and many other tools operate to support a successful Grid environment.
Globus has implementations of the GGF-defined protocols to provide:
A number of tools function along with Globus to make Grid computing a more robust platform, useful to high-performance computing communities. They include:
XML-based web services offer a way to access the diverse services/applications in a distributed environment. As of 2003 the worlds of Grid computing and of web services have started to converge to offer Grid as a web service (Grid Service). The Open Grid Services Architecture (OGSA) has defined this environment, which will offer several functionalities adhering to the semantics of the Grid Service. The vision of OGSA is to describe and to build a well-defined suite of standard interfaces and behaviours that serve as a common framework for all Grid-enabled systems and applications.
Computing vendors offer Grid solutions which are based either on the Globus Toolkit, or a proprietary architecture. Confusion remains in that vendors may badge their computing on demand or cluster offerings as Grid computing.
[[Category:Grid computing|*]] [[Category:Distributed computing]]
de:Grid-Computing es:Computación distribuida fr:Grille de calcul it:Grid computing nl:Grid computing ja:グリッド・コンピューティング pl:Siatka komputerowa zh:网格计算
NOTE: This is a draft for a revised version of the article at Grid Computing. It is a work in progress and will not be implemented without consensus approval on Talk:Grid computing.
This work has been started by User:Ora but he is not trying to 'own' the process :). If others wish to participate in this please leave a comment below this note so we can know who each other are. As background I work in the grid field, though am not technical and would appreciate help in these areas. In the spirit of full disclosure I work for the EGEE project, I encourage other contributors to likewise indicate who they work for to avoid perceptions of hidden bias.
ALo, I really want at least the opening of the article to make sense to non-technical readers. I think the current one is in many places needlessly complex.
Copied from main talk page, written by Bovineone 00:07, 15 March 2006 (UTC)
Reworked version by ora 11:31, 23 May 2006 (UTC) (in progress). Minor rework and a first stab at a new lead Andreww 17:58, 24 May 2006 (UTC)
Grid computing is an emerging methodology for computing where distributed computers, under the control of dispersed organisations (or parts of a large organisation), are collected together to provide a more powerful resource than would be available to the separate organizations. Grids can provide for a higher throughput of computations by running tasks across many networked computers. They can also provide facilities for the storage and analysis of very large data sets, enable communication between users and provide for better resource usage between (for example)supercomputer centers. The fact the the creation of a grid cuts accross administrative domains can lead to the creation of a distributed team focused on solving the problems the grid was created to solve. Such a team is known as a virtual organization. Some current research into grid tecnology is focused on inter-grid communications. Ultimatly, this may lead to the creation of "The Grid" in the same way that internetworking research lead to the creation of "The Internet".
Second paragraph - Where does the concept of a grid come from? How is it used today (not just in the sense of the way described above). One or two good examples (if possible examples that have there own page to link to)...
Because of the rapid and contuing growth of the use of grid tecnologies in academia, research and industry it is not easy to define exactly what is ment by the term "grid computing". Indeed, there may be as many definitions as grid user communities. Ian Foster summed up the problem in his article "What is the Grid? A Three Point Checklist"
Foster goes on to argue that grids must posess three features: they are built of computing resources are not administered centrally, they use open standards, and they yield a non trivial quality of service. CERN use a pragmatic definition where a grid is "a service for sharing computer power and data storage capacity over the Internet"[add ref]. This second definition leaves much lattitude regarding exactly what is meant by "grid" and there has been much use of the term as a marketing tool. [some examples...]
Grid computing is an emerging computing model that provides the ability to perform higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture that is able to distribute process execution across a parallel infrastructure. Grids use the resources of many separate computers connected by a network (usually the Internet) to solve large-scale computation problems. Grids provide the ability to perform computations on large data sets, by breaking them down into many smaller ones, or provide the ability to perform many more computations at once than would be possible on a single computer, by modeling a parallel division of labour between processes.Now a days resource allocation in grid is done in accordance with SLA(service level agreement).
Like the Internet, the Grid Computing concept has evolved from the computational needs of " big science". The Internet was developed to meet the need for a common communication medium between large, federally funded computing centers. These communication links led to resource and information sharing between these centers and eventually to provide access to them for additional users. Ad hoc resource sharing 'procedures' among these original groups pointed the way toward standardization of the protocols needed to communicate between any administrative domain. The current Grid technology can be viewed as an extension or application of this framework to create a more generic resource sharing context.
The non-profit SETI@home project is one of the most widely-known scientific causes to create a simple Grid computing project by using CPU scavenging. (Grid purists point out that Seti@home is really a distributed computing application as it does not make use of almost any Grid concepts.) Seti@home was not the first to pioneer the technique, cycle-stealing in a local computer network dates back to at least the 1970's and other non-profit projects like distributed.net preceded it, Seti@home's popularity means it has been followed by many others covering tasks such as protein folding, research into drugs for cancer, mathematical problems and climate models. Most of these projects work by running as a screensaver or background program on users' personal computers, processing small pieces of the overall data while the computer is otherwise idle or lightly used. Many such projects have achieved results that would otherwise have been much delayed or taken prohibitive investment.
While proprietary Grid computing has been used in different companies and labs for years, one of the first proprietary commercial Grid offerings was launched by Entropia in 1997. A significant difference between Grids and proprietary Grid-like projects such as SETI@home is that Grids allow for jobs to be moved to any node on the Grid and executed. For example, SETI@home's screensaver contains both code to process radio telescope data and code to handle retrieving work and returning results. The two bodies of code are intertwined into a single program. In a Grid, only the code required for retrieving work and returning results persists on the nodes. Code required to perform the distributed work is sent to the nodes separately. In this way, the nodes of a Grid can be easily reprogrammed.
Parabon Computation was awarded a patent on this business model in 2002. [1]
Grid computing offers a model for solving massive computational problems by making use of the unused resources (CPU cycles and/or disk storage) of large numbers of disparate computers, often desktop computers, treated as a virtual cluster embedded in a distributed telecommunications infrastructure. Grid computing's focus on the ability to support computation across administrative domains sets it apart from traditional computer clusters or traditional distributed computing.
Grids offer a way to solve Grand Challenge problems like protein folding, financial modelling, earthquake simulation, and climate/ weather modelling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility bureau for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water.
Grid computing has the design goal of solving problems too big for any single supercomputer, whilst retaining the flexibility to work on multiple smaller problems. Thus Grid computing provides a multi-user environment. Its secondary aims are better exploitation of available computing power and catering for the intermittent demands of large computational exercises.
This approach implies the use of secure authorization techniques to allow remote users to control computing resources.
Grid computing involves sharing heterogeneous resources (based on different platforms, hardware/software architectures, and computer languages), located in different places belonging to different administrative domains over a network using open standards. In short, it involves virtualizing computing resources.
Grid computing is often confused with cluster computing. The key difference is that a cluster is a single set of nodes sitting in one location, while a Grid is composed of many clusters and other kinds of resources (e.g. networks, storage factilities).
Functionally, one can classify Grids into several types:
The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power Grid.
Today there are many definitions of Grid computing:
Grids can be categorized with a three stage model of departmental Grids, enterprise Grids and global Grids. These correspond to a firm initially utilising resources within a single group i.e. an engineering department connecting desktop machines, clusters and equipment. This progresses to enterprise Grids where non-technical staff's computing resources can be used for cycle-stealing and storage. A global Grid is a connection of enterprise and departmental Grids which can be used in a commercial or collaborative manner.
Grid computing is a subset of distributed computing.
Grid computing reflects a conceptual framework rather than a physical resource. The Grid approach is utilized to provision a computational task with administratively-distant resources. The focus of Grid technology is associated with the issues and requirements of flexible computational provisioning beyond the local (home) administrative domain.
A Grid environment is created to address resource needs. The use of that resource(s) (eg. CPU cycles, disk storage, data, software programs, peripherals) is usually characterized by its availability outside of the context of the local administrative domain. This 'external provisioning' approach entails creating a new administrative domain referred to as a Virtual Organization (VO) with a distinct and separate set of administrative policies (home administration policies plus external resource administrative policies equals the VO [aka your Grid] administrative policies). The context for a Grid 'job execution' is distinguished by the requirements created when operating outside of the home administrative context. Grid technology (aka. middleware) is employed to facilitate formalizing and complying with the Grid context associated with your application execution.
One characteristic that currently distinguishes Grid computing from distributed computing is the abstraction of a 'distributed resource' into a Grid resource. One result of abstraction is that it allows resource substitution to be more easily accomplished. Some of the overhead associated with this flexibility is reflected in the middleware layer and the temporal latency associated with the access of a Grid (or any distributed) resource. This overhead, especially the temporal latency, must be evaluated in terms of the impact on computational performance when a Grid resource is employed.
Web based resources or Web based resource access is an appealing approach to Grid resource provisioning. A recent GGF Grid middleware evolutionary development 're-factored' the architecture/design of the Grid resource concept to reflect using the W3C WSDL (Web Service Description Language) to implement the concept of a WS-Resource. The stateless nature of the Web, while enhancing the ability to scale, can be a concern for applications that migrate from a stateful resource access context to the Web-based stateless resource access context. The GGF WS-Resource concept includes discussions on accommodating the statelessness associated with Web resources access.
The conceptual framework and ancillary infrastructure are evolving at a fast pace and include international participation. The business sector is actively involved in commercialization of the Grid framework. The ' big science' sector is actively addressing the development environment and resource (aka performance) monitoring aspects. Activity is also observed in providing Grid-enabled versions of HPC (High Performance Computing) tools. Activity in the domains of 'little science' appears to be scant at this time. The treatment in the GGF documentation series reflects the HPC roots of the Grid concept framework; this bias should not be interpreted as a restriction in the application of the Grid conceptual framework in its application to other research domains or other computational contexts.
Substantial experience is being built through the operation of various Grids, most notable of them being the EGEE infrastructure supporting LCG, the LHC Computing Grid [1]. LCG is driven by CERN's need to handle a huge amount of data, produced at a rate of almost a gigabyte per second (10 petabytes per year), a history not unlike that of the production NorduGrid. A list of active sites participating within LCG can be found online [2] as can real time monitoring of the EGEE infrastructure [3]. The relevant software and documentation is also publicly accessible [4].
The Global Grid Forum (GGF) has the purpose of defining specifications for Grid computing. GGF is a collaboration between industry and academia with significant support from both.
The Globus Alliance implements some of the standards developed at the GGF through the Globus Toolkit (Grid middleware). As a middleware component, it provides a standard platform for services to build upon, but Grid computing also needs other components, and many other tools operate to support a successful Grid environment.
Globus has implementations of the GGF-defined protocols to provide:
A number of tools function along with Globus to make Grid computing a more robust platform, useful to high-performance computing communities. They include:
XML-based web services offer a way to access the diverse services/applications in a distributed environment. As of 2003 the worlds of Grid computing and of web services have started to converge to offer Grid as a web service (Grid Service). The Open Grid Services Architecture (OGSA) has defined this environment, which will offer several functionalities adhering to the semantics of the Grid Service. The vision of OGSA is to describe and to build a well-defined suite of standard interfaces and behaviours that serve as a common framework for all Grid-enabled systems and applications.
Computing vendors offer Grid solutions which are based either on the Globus Toolkit, or a proprietary architecture. Confusion remains in that vendors may badge their computing on demand or cluster offerings as Grid computing.
[[Category:Grid computing|*]] [[Category:Distributed computing]]
de:Grid-Computing es:Computación distribuida fr:Grille de calcul it:Grid computing nl:Grid computing ja:グリッド・コンピューティング pl:Siatka komputerowa zh:网格计算