Abstract:
The commonly used statistic D2 uses the scalar product of k-word counts for sequence comparison. We show that this statistic has surprising properties that severly limit its utility. Some other statistics are proposed which show some improvement.