I would say that any Classical rules should be vetted by these scenarios:
Chaeronea- because hoplites and cavalry and phalanxes can be tested against each other.
Issus- because Alexander should have lost, so one needs to understand how to factor that in. Most games have Alexander get handed his lunch in this scenario- what is wrong?
Sicilian Wars between Syracuse and Carthage- These armies have large mixes of various types of troops, and various classical phalanx units, and battles can then 'vet' many different interactions.
Heraclea- Because Pyrrhus vs. Rome is the perfect shell for a drawn game if both sides play well. Elephants.
Metaurus- Because the Carthaginians have to be really stupid to lose as badly as in reality.
Battles where the result isn't unbalanced by massive stupidity, or massive failed tactical gambits on one side are not good tests.
Gaugamela- probably should not have been fought- but illustrates the frustration of the miniature painter who has a set of a dozen shiny scythed chariots only to find out they are worthless.
Zama- is marred by the failure of the elephant gambit- otherwise it is a good test of infantry types- until the flanks cave in from the returning cavalry. Why didn't Hannibal use his elephants as a cavalry screen? Were they needed to rub out the Roman velites- who had become unsung heroes of the new legions? If those questions are answered by your rules- then they probably are good.