Data locality and workload balance are key factors for getting high performance out of data-parallel programs on multiprocessor architectures. Data-parallel languages such as High Performance Fortran (HPF) offer thus means allowing a programmer both to specify data distributions as well as to change them dynamically in order to maintain these properties. On the other hand, redistributions can be quite expensive, and significantly degrade a program's performance. They must thus be reduced to a minimum. In this article we present a novel, aggressive approach for avoiding unnecessary remappings which works by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations achieving each on its own optimal results. In distinction to the sequential setting, the data-parallel setting leads naturally to a family of algorithms of varying power and efficiency allowing requirement-customized solutions. The power and flexibility of the new approach are demonstrated by various examples, which range \from typical HPF fragments to real world programs. Performance measurements underline its importance and show its effectivity on different hardware platforms and different settings.