Whether you want to do an alignment of protein or nucleotide sequences, Biopython offers a handy tool for a “quick and dirty” job: the pairwise2 module. It allows for global/local alignment, using custom-built matrices, predefined ones, or none at all, and an array of other options that truly make this a very very flexible tool.
To those concerned with speed issue, pairwise2 has some parts coded in C which will be used unless they weren’t compiled properly. So, on to the actual code. To compute an alignment between two sequences, we need only 10 lines of code.
from Bio import pairwise2 from Bio.SubsMat import MatrixInfo as matlist matrix = matlist.blosum62 gap_open = -10 gap_extend = -0.5 # Only using the first 60 aas for visualization reasons.. p53_human = "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP" p53_mouse = "MEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEFFEGPSEALRV" alns = pairwise2.align.globalds(p53_human, p53_mouse, matrix, gap_open, gap_extend) top_aln = alns aln_human, aln_mouse, score, begin, end = top_aln print aln_human+'\n'+aln_mouse MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP---- MEESQSDISLELPLSQETFSGLWKLLPPEDIL-PSP-HCMDDLLL-PQDVEEFF-EGPSEALRV # Comparing with EBI NEEDLE output MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP------ MEESQSDISLELPLSQETFSGLWKLLPPEDIL-PSP-HCMDDLLL-PQDVEEFF---EGPSEALRV
A list of the predefined matrices available for this is under another module, SubsMat.MatrixInfo:
benner6, benner22, benner74, blosum100, blosum30, blosum35, blosum40, blosum45, blosum50, blosum55, blosum60, blosum62, blosum65, blosum70, blosum75, blosum80, blosum85, blosum90, blosum95, feng, fitch, genetic, gonnet, grant, ident, johnson, levin, mclach, miyata, nwsgappep, pam120, pam180, pam250, pam30, pam300, pam60, pam90, rao, risler, structure
The function name, globalds, is a custom built name that already indicates 1) which kind of alignment we are performing and 2) which scoring scheme we are using. Quoting the documentation of the function:
The names of the alignment functions in this module follow the convention XX where is either "global" or "local" and XX is a 2 character code indicating the parameters it takes. The first character indicates the parameters for matches (and mismatches), and the second indicates the parameters for gap penalties. The match parameters are CODE DESCRIPTION x No parameters. Identical characters have score of 1, otherwise 0. m A match score is the score of identical chars, otherwise mismatch score. d A dictionary returns the score of any pair of characters. c A callback function returns scores. The gap penalty parameters are CODE DESCRIPTION x No gap penalties. s Same open and extend gap penalties for both sequences. d The sequences have different open and extend gap penalties. c A callback function returns the gap penalties.
This tool is probably not suitable for very accurate alignments, as shown by comparison with NEEDLE. Yet, it serves as a very practical tool that can easily be included in a script without further dependencies (like Clustal, or Needle) and few hassle.