FPGA are a way to get your concept up and running (at a lower performance) when you're still working it out, but there's no reason it can't be skipped entirely if you've already got a working design or even just your specifications completed. Simulation using the vendor's cells/macros/constraints is a necessity anyway--you can't afford to mess up.
The ASIC bitcoin miners I've seen are built using gate arrays - which makes a lot of sense, since SHA can be efficiently implemented on gate arrays, and that sort of semi-custom process flow has both shorter lead times and much lower cost than going full-custom. This doesn't work using Scrypt, though - you either end up having to implement block RAM using gates (which burns up resources very quickly) or go to a full custom approach so that you can drop lots of RAM blocks onto the die - and the second approach will easily end up costing you millions of dollars by the time you end up with production ready silicon. This is very problematic in a market which has such a large first-mover advantage.