Is there a statistical test that can compare two ordered lists

I would like to get a statistical test statistic to compare two lists. Suppose my Benchmark list is

Benchmark = [a,b,c,d,e,f,g]

and I have two other lists

A =  [g,c,b,a,f,e,d]
C = [c,d,e,a,b,f,g]

I want the test to inform me which list is closer to the Benchmark. The test should consider the absolute location, but also the relative location for example it should penalize the fact that in list A ‘g’ is at the start but in the benchmark it is at the end(how far is something from its true location), but also it should also reward the fact that ‘a’ and ‘b’ are close to each other in list C just like in the Benchmark. A and C are always shuffled Benchmark. I would like a statistical test or some kind of metric that informs me that the orderings of list A , B and C are not statistically different from that of the Benchmark but that of a certain list D is significantly different at a certain threshold or p-value such as 5%. And even among the lists A,B and C, the test should perfectly outline which ordering is closer to the Benchmark.

Well, if you come to the conclusion that a metric will suffice, here you go:

def dist(a, b):
perm = []
for v in b:
perm.append(a.index(v))

perm_vals = [a[p] for p in perm]

# displacement
ret = 0
for i, v in enumerate(perm):
ret += abs(v - i)

# coherence break
current = perm_vals.index(a)
for v in a[1:]:
new = perm_vals.index(v)
ret += abs(new - current) - 1
current = new

return ret

I’ve created a few samples to test this:

import random

ground_truth = [0, 1, 2, 3, 4, 5, 6]
samples = []
for i in range(7):
samples.append(random.sample(ground_truth, len(ground_truth)))

samples.append([0, 6, 1, 5, 3, 4, 2])
samples.append([6, 5, 4, 3, 2, 1, 0])
samples.append([0, 1, 2, 3, 4, 5, 6])

def dist(a, b):
perm = []
for v in b:
perm.append(a.index(v))

perm_vals = [a[p] for p in perm]

# displacement
ret = 0
for i, v in enumerate(perm):
ret += abs(v - i)

# coherence break
current = perm_vals.index(a)
for v in a[1:]:
new = perm_vals.index(v)
ret += abs(new - current) - 1
current = new

return ret

for s in samples:
print(s, dist(ground_truth, s))

The metric is a cost, that is, the lower it is, the better. I designed it to yield 0 iff the permutation is an identity. The job left for you, that which none can do for you, is deciding how strict you want to be when evaluating samples using this metric, which definitely depends on what you’re trying to achieve.