free hit counter

Cluster Computing

Date: 9 Jul 2019

What is cluster?
Cluster is a set of processors in parallel configurations. A centralized resource manager and scheduling system allocate resources. The cluster nodes work cooperatively as a single unified system. This is in contrast with grid where each node has separate resource manager and do not work as a single resource.

Cluster management system is an interface. It has a centralized control, full control over each component and system state.

Approach
Cluster computing is an approach to achieve high performance, reliability or high throughput computing by using a set of interconnected computers. High availability is achieved thanks to implicit redundancy. So when one node fails another node can be used.

High Performance Cluster Computing (HPCC) is a open source platform for processing big data. It supports parallel big data processing. HPCC can perform data parallelism, pipeline parallelism and system parallelism.

Some of the cluster platforms are Apache Hadoop, MapReduce and Apache Spark.

Components
The basic components are nodes, operating system, network switching hardware and node/switch interconnect. The nodes mean computers. One node functions as master nodes and other nodes are worker nodes. Whenever the threshold limit of a node is reached the load is transferred to another node. This is called load balancing. Also when a node fails to function the in-progress task is transferred to another node. This makes the system fault tolerant.

Cluster operating system provides user interface between user, application and cluster. As it is a distributed OS it provides single system image. That is, user will feel that he interacts with single system.

Advantages

  • High speed LAN and parallelism ensures high processing power.
  • It is easy to manage as all the components combined to function as single entity.
  • Clusters are scalable as it is easy to add more nodes.
  • Maintenance is easy as the system supports load balancing.
  • As all the nodes are identical so if one node shuts down another node can be used.

Disadvantages

  • Software such as Open Source Cluster Application Resources (OSCAR) must be installed.
  • Privacy issues
  • Issues in finding which component has bug.