Distributed Computing: An Introduction

Eugenia Loli 2002-04-05 Internet 5 Comments

“There are many ways to define distributed computing, and there are many different levels and types of distributed computing models and distributed application development techniques. Various vendors have created and marketed distributed computing systems for years, and numerous initiatives and architectures have been developed to permit distributed processing of data and objects across a network of connected systems.” An excellent, must read, introduction to distributed computing. Especially, have a look at its interesting comparison with clusters and supercomputers.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

5 Comments

2002-04-05 1:49 am
Anonymous
Looking at the comparison chart there is no way anyone would ever buy a cluster, much less a supercomputer. Why bother? In fact distributed computing has rather limited application overlap to supercomputers and clusters. Distributed computing fails miserably for the majority of algorithms which are run on supercomputers. Many times even clusters have a hard time reaching the performance benchmarks of true big iron supercomputers. As soon as large scale cross-node data transactions are required to solve the problem, there is no way that distributed computing will produce a timely result against a cluster, much less a full fledge supercomputer.
Distributed computing is great when each node works completely independantly, such as in the SETI project or gene folding. Throw in a problem like a fluid flow analysis, and the time it takes for the nodes to communicate with each other will be so large that it might not even be worth splitting the problem up amongst the nodes. On that note, lets not even think about the other design headaches required when we are talking about a heterogenous grouping of machines, much less one who’s performance on an algorithm can’t be garaunteed. To quote Entropia, the algorithm should be “loosely coupled, non-sequential tasks in batch processes with a high compute-to-data ratio.”
The author touches on this much more lightly than all the grand benefits of the distributed computing movement. I’m all for SETI and such projects, but when you get down to it. The mentioned sites are really pushing for a non-structure cluster-type computer for users more than the randomly distributed work stations.
2002-04-05 2:44 am
Anonymous
Distributed computing encouters the biggest problem when real-time applications are involved. Previously I was trying to do load-balancing for multimedia applications but finally gave up. Given the unpredicatability of the Internet, it’s almost impossible to handling timing constraints while balancing these data-intensive applications in a scale larger than a LAN.
The academic research on distributed computing has been carried out for years. I just don’t know why they brought this up again. Any killing applications have appeared?
2002-04-05 3:06 am
Anonymous
Since I’m currently working on a cluster, I can base my answer on actual experience:
* Firstly, the initial cost of a cluster is not high. As a matter of fact, our cluster is made out of computers that were bound to be warehoused. Initial cost of our cluster: $150 for a spool of cat5 cable, rj-45 plugs and some replacement cpu fans. I wonder how that can be much more expensive than maintaining a high-speed server needed on a distributed computing network
* Scalability is not limited on a MOSIX-type cluster. It is possible to use a complex network topology and to even connect clusters on different networks. Yes, the effort to maintain a cluster of such complexity rises exponentially, but even that is not such a big issue
* A cluster doesn’t require dedicated hardware. It benefits from it, but so does a distributed computing network. And a network’s cost is the sum of the costs of the individual machines. Unless the members of the network get their machine for free, which I doubt.
* Nodes must not be dedicated. MOSIX allows for nodes to be either dedicated, or to be dedicated onlt at certain times, or not to be dedicated at all but still be able to spawn processes.
* Risk of failure – high? Excuse me for being blatant, but what was that person smoking when he/she wrote that? (And where can I get some for myself?)
* Lost productivity downtime on a cluster? Currently the only way the whole cluster goes down is when the dhcp server dies or when the hub/switch gets on fire. And doesn’t a distributed computing network need to be maintained too? Or is that done by the networking faeries?
* How long have PVM and MPI been out there? Are they obsolete yet?
As Hank also mentioned above, the chart doesn’t consider the need for fast internode communication.
THere’s also the security issue: how well can you trust the results of machines on the network, and how sure are you once you send off a process that it will be completed?
2002-04-05 4:02 pm
Anonymous
It seems to me that clusters and distributed networks overlap to a considerable degree. One could say that in the general case that a cluster is a distributed network in one location.
What characteristics distinguish a cluster from a distributed computer? Other than the difference of physical local, the main differences seem to be the concept of dedicated v.s. shared with other non-distributed applications, communication interconnect technologies (internet v.s. ethernet v.s. highspeed card interconnect), ownership, custom hardware v.s. commodity hardware, and type of software.
As with all technology choices it’s important to make technology choices based upon the needs of the tasks at hand. What is the purpose? How can technology be applied to the problem and provide solutions that work in reasonable amounts of time and resources? The article seems to approach this as a guide to navigating the issues involved in making informed choices. In this regard it’s important that the information be correct and complete about each deployment scenario presented.
A compelling advantage of the cheap commodity computer and communications hardware allows a cluster or distributed computer network to grow at such a low cost that many organizations can afford it. This advantage will spur development of new innovations, algorithms and software solutions that we can’t anticipate yet. People will shoe horn their applications into these systems even it they don’t quite fit right. New ways of organizing these applications may be discovered and enabling efficient solutions where there were none known before. The knowledge of how to implement cluster and distributed systems will spread throughout the computing technicial communities and more solutions will be tried, more will fail and more will succeed.
A distributed algorithm does not need to be optimal to succeed as long as there is more cpu power and communications bandwidth than required. This has a nice fault tollerant aspect to it that is being exploited in many ways. It’s also a compleling reason for vast distributed networks using the internet.
A general purpose distributed system must be “self reflective” so that it can determine when and how to fragment “portions” of the tasks at hand to other nodes in the system. Currently this is done manually by programmers. Surely systems can be build that take this “expert knowledge” and automate the essential aspects of it. Is this portion of an application cpu bound, or is it communications bound? What is the mix? Is it worth while to “fork” off this piece and run it as a seperate thread, possibly on a seperate box? What information and other resources will it require? Does it need access to a large database? Will this access cause it to become communications bound and cause the solution to take longer to implement? Will it take longer to solve if we move this chunk to one or more other boxes? What about synchronization issues? How does this piece need to coordinate with other pieces? What effects will this have? How much guidance from a user is required in the decision making process? What choices can the user override? There are many more questions along this line of thinking that are worthwhile asking.
I wonder if current programming languages are too rigid in how they implement programs? A powerful “re-compiler”, “program reorganizer” or “program generator” are some of the approaches to implementing an automated decision making system for distributing pieces of an application dynamically and automatically. Case based reasoning, rule based and decision tree systems may help and are known technology workhorses. Practicle solutions require implementations that work – not 30 years in the future – but today.
I suggest that you go out an build a cluster or distributed network computer (or even a large iron box) depending on what you need for your projects. Spread the knowledge, solutions and most important of all, succeed in your endavours with the help of others as required.
2002-04-05 5:44 pm
Anonymous
This seems like typical ZD trash that looks more like propoganda for d.net or seti@home than an editorial. The author of the article doesn’t seem to really understand what they are talking about in so far as Supercomputers vs clusters vs distributed computing.
Clusters scale well, protect you from single node failure (hardware and software) very well, and require only as much administration as a single system (albeit one which is probably running a lot of services.) Basically clusters take the good portions of distributed computing of cheap heterogeneous hardware and put them spacially local so that the bandwidth can be higher, as well as the level of trust. They allow you to have something that costs much less than a supercomputer, performs very well, and scales much easier.
Distributed computing requires more than 1 IT person to monitor it, you can’t trust the clients in the general case, and the scalability isn’t limitless. In the case of data processing you still need a clearing house where data is sent out and returned to, and most jobs of this type aren’t like seti@home where the data sets are small and the calculations are hard, the data sets are typically large and they need to be sent to and from the client nodes, the author seems to disregard the bandwidth issue in determining scalability.