2011-10-27

Python optimizations, third round up !

Thanks to @fijall (a pypy core developer), we gained 2 new implementations !
def htmlGB4():
    l = []
    for c in r:
        if (c>=u'a' and c<=u'z') or (c>=u'A' and c<=u'Z') or (c>=u'0' and c<=u'9'):
            l.append(chr(ord(c)))
        else:
            insert = '&#' + str(ord(c)) + ';'
            l.append(insert)
    return ''.join(l)
assert(html7() == htmlGB4())

from cStringIO import StringIO
 
def htmlGB5():
    s = StringIO()
    for c in r:
        if (c>=u'a' and c<=u'z') or (c>=u'A' and c<=u'Z') or (c>=u'0' and c<=u'9'):
            s.write(chr(ord(c)))
        else:
            insert = '&#' + str(ord(c)) + ';'
            s.write(insert)
    return s.getvalue()
The specialized pypy StringBuilder he passed on looks interesting but for some reason it didn't appear on the build I had on gentoo on PyPy 1.5 I ported the htmlGB5 to cython also. When no data appears it means I didn't ported it. Here is the new comparative graph, I omitted htmlGB3 because it was throwing the graph out of scale.
So pypy shines with the StringIO where it is more then twice as fast as cython where cython is more then twice as fast as python. Thank a lot for the contribution. Enjoy !

2011-10-26

Python optimizations continued ...

To give a fair treatment to pypy, I tried 2 c-like implementations similar to the cython one and the pypy 1.5 results are around 30% to 250% slower then standard python interpreter. Feel free to correct me if I did something obviously wrong here. Add those to test.py:
def htmlGB2():
    s = r
    a = array.array('c', itertools.repeat('\0', len(s)*10))
    i = 0
    for c in s:
        if (c>=u'a' and c<=u'z') or (c>=u'A' and c<=u'Z') or (c>=u'0' and c<=u'9'):
            a[i] = chr(ord(c))
            i += 1
        else:
            insert = '&#' + str(ord(c)) + ';'
            for cc in insert:
                a[i] = cc
                i += 1
    return a.tostring()

def htmlGB3():
    s = r
    result = ''
    for c in s:
        if (c>=u'a' and c<=u'z') or (c>=u'A' and c<=u'Z') or (c>=u'0' and c<=u'9'):
            result += chr(ord(c))
        else:
            result +='&#'
            result +=str(ord(c))
            result +=';'
    return result

⚫ python test.py
html7:   2.45566296577
htmlGB:   2.13124704361
htmlGB2:   11.926774025
htmlGB3:   3.90490102768

⚫ pypy-c1.5 test.py 
html7:    2.41169714928
htmlGB:   1.77979898453
htmlGB2:   15.9636659622 <-
htmlGB3:   91.441778183 <-

Python optimizations exercise.

From there I wanted to post cleanly all my findings. I include all my sources, so you can reproduce it. Basically, Pavel wanted to optimize a simple use case. Here is my shot at it: First the test.py, the html7 implementation and a slightly improved one where I don't force python to go back and forth between bytes and unicode + the benchmark timer :
import timeit


WHITELIST2 = set('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
WHITELIST2_UNI = set(u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
r = u'234782384723!#$#@!%$@#%$@#%#$%%%%%%%%%%%@#$%342583492058934028590342853490285902344#$%%*%****%7jkb6h546777776ynkk4b56byhbh5j'*500

def html7():
    s = r
    lstr = str; lord = ord; lWHITELIST2 = WHITELIST2
    return "".join([
        c if c in lWHITELIST2
        else "&#" + lstr(lord(c)) + ";"
        for c in s])

def htmlGB():
    s = r
    lstr = unicode; lord = ord; lWHITELIST2 = WHITELIST2_UNI
    return u''.join([
        c if c in lWHITELIST2
        else u'&#' + lstr(lord(c)) + u';'
        for c in s])
assert(html7() == htmlGB())

t = timeit.Timer(stmt = html7)
print 'html7:' + str(t.timeit(number=100))

t = timeit.Timer(stmt = htmlGB)
print 'htmlGB:' + str(t.timeit(number=100))
It gives me that as timing under python and pypy 1.5 :
⚫ python test.py 
html7:2.48782491684
htmlGB:2.14983606339

⚫ pypy-c1.5 test.py
html7:2.44286298752
htmlGB:1.84629392624
Note: the other implementation with caching & statistical tryouts are too convoluted and too much on the speed over memory tradeoff for this simple task IMHO. Then I wanted to go further, so I made a simple port on cython then a "bare metal" version on cython, and I have been really impressed by it ! In order to compile in cython you need a simple makefile-like setup.py file:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("encode", ["encode.pyx"])]

setup(
  name = 'Encoding Test',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)
Then here are my implementations : cython is the direct port, cython_alt is the C-like bare metal one. In the file encode.pyx put:
WHITELIST = set(u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')

from libc.stdlib cimport malloc, free
from libc.stdio cimport sprintf

def cython(unicode s):
    return u''.join([
    c if c in WHITELIST
    else u'&#' + unicode(< long > c) + u';'
    for c in s])

def cython_alt(unicode s):
    cdef char * buffer = < char * > malloc(len(s) * 10)
    cdef int i = 0
    cdef Py_UCS4 c

    for c in s:
        if (c >= u'a' and c <= u'z') or (c >= u'A' and c <= u'Z') or (c >= u'0' and c <= u'9'):
            buffer[i] = < char > c
            i += 1
        else:
            sprintf(buffer + i,'&#%d;',c)
            while buffer[i]:
                i += 1
    result = < bytes > buffer
    free(buffer)
    return result
And the same test runner test_cython :
import timeit
from encode import cython
from encode import cython_alt


r = u'234782384723!#$#@!%$@#%$@#%#$%%%%%%%%%%%@#$%342583492058934028590342853490285902344#$%%*%****%7jkb6h546777776ynkk4b56byhbh5j'*500

def cython2():
    cython(r)

def cython_alt2():
    cython_alt(r)

assert(cython(r) == cython_alt(r))

t = timeit.Timer(stmt = cython2)
print 'cython:' + str(t.timeit(number=100))

t = timeit.Timer(stmt = cython_alt2)
print 'cython_alt:' + str(t.timeit(number=100))
To compile it just do :
python setup.py build_ext --inplace
The results :
⚫ python test_cython.py 
cython:1.70311307907
cython_alt:0.348756790161
Booya ! :)