INFRASTRUCTURE

Infrastructure:

Grid tools are concerned with resource discovery, data management, scheduling of computation, security, and so forth.

But the Grid goes beyond sharing and distributing data and computing resources. For the scientist, the Grid offers new and more powerful ways of working, as the following examples illustrate:

 Science portals: Science portals make advanced problem-solving methods easier to use by invoking sophisticated packages remotely from Web browsers or other simple, easily downloaded ‘thin clients.’ The packages themselves can also run remotely on suitable computers within a Grid. Such portals are currently being developed in biology, fusion, computational chemistry, and other disciplines.

Distributed Computing : High-speed workstations and networks can yoke together an organization’s PCs to form a substantial computational resource. Future improvements in network performance and Grid technologies will increase the range of problems that aggregated computing resources can tackle.

Large-scale data analysis: the analysis of the many petabytes of data to be produced by the LHC and other future high-energy physics experiments will require the marshalling of tens of thousands of processors and hundreds of terabytes of disk space for holding intermediate results.

Computer-in-the-loop instrumentation: Consider an astronomer studying solar flares with a radio telescope array. The deconvolution and analysis algorithms used to process the data and detect flares are computationally demanding. Running the algorithms continuously would be inefficient for studying flares that are brief and sporadic. But if the astronomer could call on substantial computing resources (and sophisticated software) in an on-demand fashion, he or she could use automated detection techniques to zoom in on solar flares as they occurred.

GLOBUS: This toolkit includes software for security, information infrastructure, resource management, data management, communication, fault detection, and portability.

Functions in Grid:

Authentication, authorization and agreement on policies (single sign-on, Mapping local security mechanism)

Querying, the Resource Catalog for availability of computers, storage systems etc.

Requesting the resources to initiate operations(computations, move data, output etc.)

Monitoring the progress of  the operation.

__________________________________________________________________

Database and the grid

There are two main dimensions of complexity to the problem of integrating databases into the Grid: implementation differences between server products within a database paradigm and the variety of database paradigms.

A database is a collection of related data. A database management system (DBMS) is responsible for the storage and management of one or more databases. Examples of

DBMS are Oracle 9i, DB2, Objectivity and MySQL.

Range of uses of databases on grid

  • Meta data: This is data about data, and is important as it adds context to the data, aiding its identification, location and interpretation. Key metadata includes the name and location of the data source, the structure of the data held within it, data item names and descriptions.
  • Provenance: Type of metadata that provides information on the history of data.It includes information on the data’s creation, source, owner, what processing has taken place.
    • Knowledge repositories: Information on all aspects of research can be maintained through knowledge repositories.
    • Project Repositories :Information about specific projects can be maintained through project repositories

 

Grid Application and requirement of database:

The range of facilities already offered by existing DBMSs will be required. These support both the management of data and the management of the computational resources used to store and process that data. Specific facilities include

  1. Query and update facilities
  2. Programming interface
  3. Indexing
  4. High availability
  5. Recovery
  6. Replication
  7. Versioning
  8. Evolution
  9. Uniform access to data and schema
  10. Concurrency control
  11. Transactions
  12. Bulk loading
  13. Manageability
  14. Archiving

The Grid is intended to support the wide-scale sharing of large quantities of information.

  1. Scalability
  2. Handling unpredictable usage
  3. Meta data driven access
  4. Multiple database federation

Grid and database interconnection in the present scenario

  • The dominant middleware used for building computational grids is Globus provides a set of services covering grid information, resource management and data management
  • An orthogonal component that runs through all Globus services is the Grid Security Infrastructure (GSI). This addresses the need for secure authentication and communications over open networks

Integrating the database into the grid: For integrating many different interfaces could be designed to meet the requirements within the proposed framework, though we hope that work within the Global Grid Forum will lead to the definition of interface standards. The service wrappers will have to be custom produced, but, in the future, if the commercial importance of the Grid increases, and standards are defined, then it is to be hoped that DBMS vendors will offer Grid-enabled service interfaces as an integral part of their products. Some of the the services are:

  • Metadata
  • Query
  • Transaction
  • Bulk loading
  • Notification
  • Scheduling
  • Accounting

Federating the database systems across the grid :

  • One option is for a Grid application to interface directly to the service interfaces of each of the set of DBSs whose data it wishes to access
  • Second option is using Grid-enabled middleware to produce a single, federated ‘virtual database system’ to which the application interfaces.

Two different scenarios can be envisaged for the creation of a Virtual DBS:

  • A user decides to create a Virtual DBS that combines data and services from a specific set of DBSs that he/she wishes to work with. These may, for example, be well known as the standard authorities in his/her field.
  • A user wishes to find and work with data on a subject of his/her interest, but he/she does not know where it is located.

_________________________________________________________

Open grid service and data architecture

 

Data grid: Data Grids address computational and data intensive applications that combine very large datasets and a wide geographical distribution of users and resources

Desirable features of the grid

  • The sharing of resources must be : flexible, secure, coordinated, robust, scalable, ubiquitously accessible, measurable (qos metrics),

Transparent to the users

  • The distributed resources must be :interoperable, manageable, available and extensible

 Virtual organisation:  A Virtual Organization (VO) is defined as a group of individuals or institutes who are geographically distributed but who appear to function as one single unified organization.

The members of a VO usually have a common focus, goal or vision, be it a scientific quest or a business venture. They collectively dedicate resources, for which a well-defined set of rules for sharing and quality of service (QoS) exists, to this end

 Needs for desirable features: The access to the data has to be secured, enabling different levels of security according to the needs of the VO in question, while at the same time maintaining the manageability of the data and its accessibility (SECURE, COORDINATED, MANAGEABLE). The reliability and robustness requirements in the existing communities interested in Data Grids are high, since the computations are driven by the data – if the data are not accessible, no computation is possible.

 

What is OGSA approach: In OGSA each VO builds its infrastructure from existing Grid service components and has the freedom to add custom components that have to implement the necessary interfaces to qualify as a Grid service. OGSA focuses on the nature of services that makes up the Grid.

In OGSA, existing Grid technologies are aligned with Web service technologies in order to profit from the existing capabilities of Web services, including

  • service description and discovery,
  • automatic generation of client and server code from service descriptions,
  • binding of service descriptions to interoperable network protocols,
  • compatibility with higher-level open standards, services and tools, and
    • broad industry support.

OGSA introduces the concept of Grid Service Handles (GSHs) that are unique to a service and by which the service may be uniquely identified and looked up in the Handle Map.

Data grid services: It compromises of mainly 3 components

  1. 1.     Data: In principle, Data Grids need to be able to handle data elements – from single bits to complex collections and even virtual data, which must be generated upon request. All kinds of data need to be identifiable through some mechanism – a logical name or ID – that in turn can be used to locate and access the data

 

  1. 2.     Functionality and the services:  Various things which plays important role in the functionality are :
  • VO management
  • Data transfer
  • Data storage
  • Data management
  • Metadata
  • Security
  • Control and monitoring
  • Reliability and fault tolerance
  1. Data grid and OGSA: OGSA introduces several service concepts that need to be adopted in order to qualify as Grid services. Necessary components are
  • Factories
  • Registries
  • Service lifetime managements
  • Data lifetime management

 Issues:  Various issues which needs to be considered for the improvement of the system are

  1. Availabality
  2. Scalibilty
  3. Monitorabilty
  4. Integration
  5. Security
  6. Interperabilty and compatibility
  7. Service discovery
  8. Manageabilty

<<Previous


Next>>

  1. No comments yet.
  1. No trackbacks yet.

Leave a comment