Python tips

Consider writing your own generator.

The previous tip hints at a general pattern for optimization—namely, that it’s better to use generators where possible. These allow you to return an item at a time rather than all the items at once. As mentioned, the xrange() function is a generator in Python 2, as is the range() function in Python 3. If you’re working with lists, consider writing your own generator to take advantage of this lazy loading and memory efficiency. Generators are particularly useful when reading a large number of large files. It’s possible to process single chunks without worrying about the size of the files. Here’s an example you might use when web scraping and crawling recursively.

import requests
import re

def get_pages(link):
  pages_to_visit = []
  pages_to_visit.append(link)
  pattern = re.compile('https?')
  while pages_to_visit:
    current_page = pages_to_visit.pop(0)
    page = requests.get(current_page)
    for url in re.findall('<a href="([^"]+)">', str(page.content)):
      if url[0] == '/':
        url = current_page + url[1:]
      if pattern.match(url):
        pages_to_visit.append(url)
    yield current_page
webpage = get_pages('http://www.example.com')
for result in webpage:
  print(result)

This example simply returns a page at a time and performs an action of some sort. In this case, you’re printing the link. Without a generator, you’d need to fetch and process at the same time or gather all the links before you started processing. This code is cleaner, faster, and easier to test.

Use xrange() instead of range().

Python 2 used the functions range() and xrange() to iterate over loops. The first of these functions stored all the numbers in the range in memory and got linearly large as the range did. The second, xrange(), returned the generator object. When looping with this object, the numbers are in memory only on demand.

import sys
numbers = range(1, 1000000)
print(sys.getsizeof(numbers))

This returns 8000064, whereas the same range of numbers with xrange returns 40. If your application is in Python 2, then swapping these functions can have a big impact on memory usage. The good news is that Python 3 implements the xrange() functionality by default. So, while there’s no xrange() function, the range() function already acts like this.

Learn itertools.

It’s been called a gem. If you haven’t heard of it, then you’re missing out on a great part of the Python standard library. You can use the functions in itertools to create code that’s fast, memory efficient, and elegant. Dive into the documentation, and look for tutorials to get the most out of this library. One example is the permutations function. Let’s say you wanted to generate all the permutations of [“Alice”, “Bob”, “Carol”].

import itertools
iter = itertools.permutations(["Alice", "Bob", "Carol"])
list(iter)

This function will return all possible permutations:

[('Alice', 'Bob', 'Carol'),
 ('Alice', 'Carol', 'Bob'),
 ('Bob', 'Alice', 'Carol'),
 ('Bob', 'Carol', 'Alice'),
 ('Carol', 'Alice', 'Bob'),
 ('Carol', 'Bob', 'Alice')]

It’s really useful and blazingly fast!

Using enumerate() function.

The enumerate() function adds a counter to an iterable object. An iterable is an object that has an iter method which returns an iterator. It can accept sequential indexes starting from zero and raises an IndexError when the indexes are no longer valid. A typical example of the enumerate() function is to loop over a list and keep track of the index. For this, we could use a count variable. But Python gives us a nicer syntax for this using the enumerate() function.

# First prepare a list of strings

subjects = ('Python', 'Coding', 'Tips')

for i, subject in enumerate(subjects):
    print(i, subject)

# Output:

    0 Python
    1 Coding
    2 Tips

Running Python programs from Python interpreter.

The Python interactive interpreter is very easy to use. You can try your first steps in programming and use any Python command. You type the command at the Python console, one by one, and the answer is immediate. Python console can get started by issuing the command:

# start python console

$ python
>>> <type commands here>

In this article, all the code starting at the  >>> symbol is meant to be given at the Python prompt. It is also important to remember that Python takes tabs very seriously – so if you are receiving any error that mentions tabs, correct the tab spacing.

Running Python scripts.

On most of the UNIX systems, you can run Python scripts from the command line in the following manner.

# run python script

$ python MyFirstPythonScript.py

Virtualenv

Another important function of python is Virtualenv. Virtualenv means Virtual Environment. This, now my friends, is a very awesome function of python. Basically, to test python in different conditions, you would normally you would have to change the global python environment. But, one of the key benefits of sandboxing your python environment is that you can easily test one code under different python versions and package dependencies. To install virtualenv, you need to install pip first. You can do as follows: easy_install pip pip install virtualenv virtualenv python-workspace cd python-workspace source ./bin/activate python

Pip

Pip is something maybe most people know of. But still, it is awesome stuff that you need to know if you are starting with python. Sometimes, you need to inspect the source of a package before installing it. Most of the times, it’s for installing a newer version of some package. So, you can simply install pip and do the following:

pip install --download sqlalchemy_download sqlalchemy pip install --no-install sqlalchemy pip install --no-download sqlalchemy If you want to install the bleeding-edge version of a package, you can directly check it from the GIT repository. pip install git+https://github.com/simplejson/simplejson.git pip install svn+svn://svn.zope.org/repos/main/zope.interface/trunk

JSON-esque

Python has a lot of hidden stuff underneath. It only takes a person and his time to find out what all magical operators and stuff are hidden inside. One among all the other stuff is the famous JSON-esque. You can create nested dictionaries without explicitly creating sub-dictionaries. They magically come into existence as we reference them. Example as follows: users = tree() users['harold']['username'] = 'hrldcpr' users['handler']['username'] = 'matthandlersux' Now you can print the above as JSON with:

print(json.dumps(users)) And it will look like this {"harold": {"username": "hrldcpr"}, "handler": {"username": "matthandlersux"}}

Merging Python and Shell Scripts

Now, this is something you can’t do with C or C++. If you are an open-source guy, you would surely use Linux as the main Operating OS, or at least a Dual Boot. So, Linux already includes python. And python is extremely compatible with Linux. This gives us the benefit of compiling and merging them together. You can simply create a script that can work as a normal Unix script as well as an interpreted Python code at the same time. When writing a shell script, you need a four quote character and an empty string to the shell, but you need to do that with a triple-quoted string with a quote character in python. Remember that the first string in a script can be easily stored as a doc string for a module, but after that, the python interpreter will simply ignore it. An example is as follows: #!/bin/sh doc = """ Demonstrate how to mix Python + shell script. """ import sys print "Hello World!" print "This is Python", sys.version print "This is my argument vector:", sys.argv print "This is my doc string:", doc sys.exit (0)