The runtime cost of instance attribute access rather than local variable access can account for a quarter of a program’s run time; I just tried it on my phone:
Python 3.7.4 (default, Jul 28 2019, 22:33:35)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: class X:
...: def y(z):
...: return z.a + z.a + z.a + z.a + z.a
...: def w(z):
...: a = z.a
...: return a+a+a+a+a
...:
In [2]: x = X()
In [3]: x.a = 3
In [4]: x.y()
Out[4]: 15
In [5]: x.w()
Out[5]: 15
In [6]: %timeit x.y()
1.7 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit x.w()
1.11 µs ± 5.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is often surprising to novices in Python, but attribute access involves a hash table lookup.
Note that here we are comparing two instance attribute accesses against seven, not zero against five. Evidently each of them cost about 118 ns, so if we could reduce them to zero, the method call and return and four additions would cost only 870 ns, which is closer to half the runtime than ¾.
Moral: benchmark before pooh-poohing a hotspot.
Also though note that several thousand instructions is a pretty heavy price to pay for four integer additions.
> can account for a quarter of a program’s run time
It can but it will more likely account for much much less than that, unless all your programs are massive loops that do little more than access the same attribute repeatedly.
The fact that the difference is 10ms and 8ms respectively suggests that the speedup of attribute access isn't what's showing up in your measurements. In one case we access the slot "a" once; in the other case we access it five times. How can that be a 20% difference?
Note that here we are comparing two instance attribute accesses against seven, not zero against five. Evidently each of them cost about 118 ns, so if we could reduce them to zero, the method call and return and four additions would cost only 870 ns, which is closer to half the runtime than ¾.
Moral: benchmark before pooh-poohing a hotspot.
Also though note that several thousand instructions is a pretty heavy price to pay for four integer additions.