Several large real-world applications have been developed for distributed and parallel architectures. We examine two different program development approaches: First, the usage of a high-level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance. A source-to-source compiler has been employed to automatically compile programs - written in a high-level programming paradigm - into message passing codes. Second, manual program development by using a low-level programming paradigm - such as message passing - enables the programmer to fully exploit a given architecture at the cost of a time-consuming and error-prone effort. Performance tools play a central role to support the performance-oriented development of applications for distributed and parallel architectures. SCALA - a portable instrumentation, measurement, and post-execution performance analysis system for distributed and parallel programs - has been used to analyse and to guide the application development by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performanceinformation back to the input program. We show several experiments of SCALA when applied to real-world applications. These experiments are conducted for a NEC Cenju-4 distributed memory machine and a cluster of heterogeneous workstationsand networks.