I'd like your opinion about something. I have been toying around for a while with the idea of creating a huge parallel computer out of old machines. I see old, cheap computers all the time and it occurs to me that it might be possible to create an extremely fast machine for relatively little money out of these. Unfortunately, I know very little about parallel computing. I do not know exactly how I would go about linking these computers together by software, that is. But let me ask some specifics. Firstly, most of the really cheap computers you can find are x86, and moreover have been designed specifically with a windows platform in mind. Therefore, whatever software I used would certainly need to be for x86, but moreover it would probably be best if it were windows based. I could possibly use unix or linux instead, but I have no experience with them and it is my perception that a lot of the software I might want to use on a parallel computer system might not be available for those operating systems. In any case, what software would you recommend using for parallel computing, and which operating system? Secondly, a large number of the cheap computers that you can find are REALLY slow (200-300Mhz). It might take a lot of them to ammount to something, and I have heard that there is already a limit to the ammount of processing power that can be gained in relation to the number of computers used. What I am asking is what do you think would be minimum specs of each computer for it to be worth my time and money? Thirdly, do you think this idea is actually feasable? I mean, overall, do you think one could go around and collect computers at yard sales and actually create a really fast computer? My idea is that separately they are obsolete, but together they could be faster than current computers. And if you think that is possible, do you think I would really be able to use it for anything useful? I've no idea if any of you have any knowledge on this matter, but any thoughts are welcome.
Sadly, while it is a fun idea to setup, it is generally not worthwhile when you are using computers that slow. The problem is that parallel computers are most useful for multiple processes or ones which can be broken up (like the folding projects and render farms). If you are not using software designed to take advantage of hardware like this, you might see very low performance. The major bottleneck in systems like these is the network transfer time. To effectively do anything, the systems need to be networked with the best equipment you can get. Standard 100 Mbps connections tend to be slow for applications which require constant communication, but they can be fine for "break up and send" style processing. If you really want to look into this, look up the Beowulf clustering project. I believe the software is open source and it is basically what you are looking to setup.
If it isn't worthwhile for computers that slow, how fast would you think they would need to be? 1Ghz maybe? 1Ghz per computer would add up pretty fast, I should think. By the way that Beowulf clustering project site looks very promising. Thank you for the advice!
Well, it all depends on what you want to do with the setup. If you just want to play around with it, then take whatever computers you can get. However, if you want to actually use the setup on a regular basis for computationally intensive applications, then you are going to have to weigh the pros and cons of using a smaller number of faster computers versus a larger number of slower computers. One thing that you should also consider factoring in is the cost of electricity. If you have a large number of computers, then you are going to have a large amount of power that is going to need to be fed into them. I am not just talking about kilowatt hour cost, but the cost of providing enough circuits to run everything. Don't forget to include all of the networking hardware (hubs, switches, etc.) that need to be powered too.
I'm not entirely sure what I want to do with it. I know that in general programs have to be made to run specifically on a parallel computing network, but have any significant attempts been made to automatically spread the processes of a given program over the network? And beyond that, what kinds of programs have been made to use that kind of setup? Has anyone adapted the kind of programs that average people use daily on a computer for a parallel computing network? EDIT: Actually, I can think of one thing I would most definitely like to use it for; video editing. It would be awesome to be able to edit HD video very quickly. Are there any video editing programs for parallel computing networks?
Well actually it has been done. Although its not a "process splitting" but rather a "migration" of a process from one machine to another (with all the troubles that this brings). Like said before, the network latency is undoubtely the bottleneck, if you use standard wiring. Suppose you have tow PC's, A and B; You could for example use one single ethernet connection (and by connection i mean physical cable) from A to B and another one for the traffic from B to A. That way you would be sure to not have any collisions and thus big network latency. Another solution would be that to use the Gbit lan - but truly i don't have any clue about how it works.. I have to develope a project for a Network computing exam at my university the next months. I was actually thinking about developing some sort of "code migrator", which lets you send pieces of code from one source host that is full occupied to one which for example has nothing to do. It's nothing new, really, just a very simplicistic view of parallel network computing, but I thought it could give you an idea. If you use slow computers you wont have any advantage by performing a complete task on one single machine..but if you could split it up somehow and then spread it all over the network it would surely gain in time! That would truly be something powerfull...to emulate a multi CPU core on a Network basis ;P
Most systems of this type use exactly the same hardware setup per node in the grid/network. Unless you have a number of matching systems, it probably wouldn't be worthwhile - or even possible. These kinds of projects are usually for number crunching, not for gaming or similar. It wouldn't end up as a fast system, just a very powerful processing system.
Taucias: why do they need to have the same hardware setup? Here is an example contrary. Kammedo: can you list any examples of this process "migrating" you're talking about? Would it work with a video editing program? As for the problem of network speed, couldn't you put 2 network cards in each computer and connect each one to one other computer, like a neural net?
Short answer: Having identical setups actually reduces the overall scheduling workload. Long answer: If everything is the same, then all process slices can be scheduled in the same way regardless of which system the work is being sent to. If there are differences, then the scheduler needs to determine which system is receiving the slice before preparing it to be sent and then needs to prepare the slice specifically for that hardware.
hmmm why not let the fastest(and almost identical) pcs be the frontline of the renderfarm. And then let the other, meaning the slower machines be in the back of the renderfarm, and let them be also almost identical, with HDs, and ram and so forth ? Since the makers of SHREK used their whole network of pcs as renderfarms, when they weren´t "working" on the next scene . And I don´t believe , the PCs there were identical. I know I am full of alot of nonsense, but this does make a little sense to me. So to me, then try to make them identical, in pairs, or in triplets. And let the "best" PCs be the front runners, of the renderfarm.
They need to be the same spec so that the results are predictable and come at an anticipated time, otherwise the performance you get out of the grid won't be very reliable for anything other than distributed processing of fixed packets of data (like SETI or similar). If you wanted to have a system that works in one of the modes of something like the CELL processor, where you split a task into chunks, farm it out to processing units and combine the result, then you need matching specifications on all machines or the latency will be horrible and task delegation will get confusing. You couldn't run a real-time application without having matching units, for example. My understanding of a neural net is quite different to simply daisy chaining machines together.