Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The runtime cost of instance attribute access rather than local variable access can account for a quarter of a program’s run time; I just tried it on my phone:

    Python 3.7.4 (default, Jul 28 2019, 22:33:35)           
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.                                                                                                     
    In [1]: class X:                                           
       ...:     def y(z):                                      
       ...:         return z.a + z.a + z.a + z.a + z.a
       ...:     def w(z):     
       ...:         a = z.a                                    
       ...:         return a+a+a+a+a
       ...:                                             

    In [2]: x = X()

    In [3]: x.a = 3                                     

    In [4]: x.y()                                           
    Out[4]: 15                                          

    In [5]: x.w()                                  
    Out[5]: 15
                                                            
    In [6]: %timeit x.y()                               
    1.7 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)                                                                                              

    In [7]: %timeit x.w()                                  
    1.11 µs ± 5.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is often surprising to novices in Python, but attribute access involves a hash table lookup.

Note that here we are comparing two instance attribute accesses against seven, not zero against five. Evidently each of them cost about 118 ns, so if we could reduce them to zero, the method call and return and four additions would cost only 870 ns, which is closer to half the runtime than ¾.

Moral: benchmark before pooh-poohing a hotspot.

Also though note that several thousand instructions is a pretty heavy price to pay for four integer additions.



> can account for a quarter of a program’s run time

It can but it will more likely account for much much less than that, unless all your programs are massive loops that do little more than access the same attribute repeatedly.


Compared to __slots__ (also Python 3.7.4)

Using your definition of class X

  %timeit x.w()
  313 ns ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Add __slots__

  class X:
    __slots__ = ('a')
    def w(z):
      a = z.a
      return a+a+a+a+a

  %timeit x.w()
  271 ns ± 7.13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
About 14% less time.


Your computer is evidently faster than my phone; how fast was y() for you?

Also you are missing a comma in your would-be tuple.


For what it's worth I got about 196 and 131 for the original y and w, and after adding __slots__ (with comma) I got 186 and 123.


The fact that the difference is 10ms and 8ms respectively suggests that the speedup of attribute access isn't what's showing up in your measurements. In one case we access the slot "a" once; in the other case we access it five times. How can that be a 20% difference?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: