One of the problems with automation of antivirus signature creation is that if a few AV vendors start detecting something as malicious, even with heuristics, "automagically" soon afterwards other AV vendors start doing the same without even checking if the file in question is in fact malicious or not, even going as far as creating specific signatures for it via automated systems.
An example of such a False Positive (FP) problem with automatic AV signature creation is the case of Fenomen Games (aka Gamecentersolution), by Legacy Interactive. Fenomen is a company that creates and distributes games. They do so via a bunch of "Game Downloaders" which basically allow users to choose and download different games on-the-fly. The problem is that these "Game Downloaders" have very similar characteristics to known "Trojan Downloaders", such as the runtime-packing and their behaviour (connecting to the Internet, downloading something, executing it and then exiting), so they naturally set off heuristic alarms like a christmas tree.
After manual analysis the only thing I found truly suspicious about it is the fact that we have over 200.000 different unique "Game Downloaders" from Fenomen Games in our Collective Intelligence database. The ones I checked are not malicious in any way nor do they do anything different than what they advertise (if you have evidence of the contrary please let me know). Fenomen seems pretty active from a partner/affiliate perspective and this could be the reason for the multitude of unique MD5's.
The problem with these detections are not the "heuristic" detections but the signature detections. Normally (traditionally that is) a signature detection signifies a "100% known malicious" program. However in today's world where signatures are created automatically based on other criteria, False Positives are amplified and rolled-over to other engines freely.
Some statistics of detections per engine based on the 200.000 Fenomen Games Download samples we have (names have been omitted to protect the "innocent"):
Scanner A 137.465 detections
Scanner B 101.061 detections
Scanner C 96.472 detections
Scanner D 68.264 detections
Scanner E 45.602 detections
Scanner F 38.027 detections
Scanner G 31.603 detections
Scanner H 28.152 detections
And so on…
These include both heuristic and signature detections. All of the latter are false positives by very well known AV engines!
The other problem created by these "FPs generated by automated signature systems" is that, once considered malicious, samples of these FPs are included in regular "collection sharing packages" amongst different AV labs and, more importantly, independent research and testing organizations. These type of organizations, which rely on multi-scanners to classify their testbeds, should take good care of not falling into the same mistake. So the next time you see detection rates based on AV signatures published in a magazine or website, you should be asking yourselves "what" is truly being tested.
All in all, automation at the lab is an absolute must for any AV vendor that wants to keep up with the large volume of new incoming malware. However it is critical that these systems are well supervised, finetuned and backed by engineers who oversee the signatures generated automatically to avoid creating "fenomenal" false positive problems.