biseqt.pw module

biseqt.pw.lib

The loaded shared object for pwlib. All functions defined in the header file are accessible through this object. This object is automatically populated upon loading this module (cf. setup_ffi()) and users never have to manipulate it.

biseqt.pw.ffi

The main FFI instance used throughout this module. This object is automatically populated upon loading this module (cf. setup_ffi()) and users never have to manipulate it.

biseqt.pw.setup_ffi()[source]

Instantiates an FFI object as ffi and loads the shared object for pwlib into lib. This function is automatically called when this module loads.

Note

CFFI has issues with loading macros as they are defined in a header file. For this reason, and since we don’t use the macros in python code any line that begins with #define is ignored from the header file. This means multiline macros will not work.

biseqt.pw.STD_MODE

Standard alignment type; time and memory complexity is quadratic in sequence lengths.

biseqt.pw.BANDED_MODE

Banded alignment type; time and memory complexity is linear in sequence lengths with a constant proportional to band width. This mode is incompatible with local alignments.

biseqt.pw.GLOBAL

Standard global alignment problem, i.e Needleman-Wunsch.

biseqt.pw.LOCAL

Standard local alignment problem, i.e Smith-Waterman.

biseqt.pw.START_ANCHORED

Standard local alignment demanding that it begins at the start of frame of both sequences.

biseqt.pw.END_ANCHORED

Standard local alignment demanding that it ends at the end of frame of both sequences.

biseqt.pw.OVERLAP

Standard suffix-prefix alignment in any direction; this includes alignments where a prefix of either sequence matches a suffix of the other and alignments where one sequence is a substring of the other.

biseqt.pw.START_ANCHORED_OVERLAP

Standard suffix-prefix alignment demanding that it begins at the start of frame of both sequences.

biseqt.pw.END_ANCHORED_OVERLAP

Standard suffix-prefix alignment demanding that it ends at the end of frame of both sequences.

biseqt.pw.B_GLOBAL

Banded global alignment problem; may not be well-defined (end points of the table may not lie in band).

biseqt.pw.B_OVERLAP

Banded suffix-prefix alignment problem in either direction including substring alignments.

biseqt.pw.B_LOCAL

Banded local alignment problem.

class biseqt.pw.Aligner(origin, mutant, **kw)[source]

Bases: object

Provides a context that solves a pairwise alignment problem. Memory is allocated upon entering the context and is freed upon leaving it. All alignment calculations (solve() and traceback()) are explicitly invoked by the caller.

Parameters:
Keyword Arguments:
 
  • origin_range (tuple) – The original (“from”) sequence; cf. alnframe::origin_range.
  • mutant_range (tuple) – The mutant (“to”) sequence; cf. alnframe::mutant_range.
  • alnmode (int) – One of the STD_MODE or BANDED_MODE, default is STD_MODE; cf. alnprob::mode.
  • alntype (int) – One of the allowed alingment types for the given alnmode, see ALN_TYPES; default is GLOBAL; cf. std_alnparams and banded_alnparams.
  • subst_scores (list) – The overriding definition of the substitution score matrix; cf. alnscores::subst_scores. Default is None in which case the score matrix is populated based on match and mismatch scores.
  • match_score (float) – If subst_scores is not given, this parameter is used to populate the diagonal entries of the substitution score matrix; default is 1.
  • mismatch_score (float) – If subst_scores is not given, this parameter is used to populate the off-diagonal entries of the substitution score matrix; default is 0.
  • go_score (float) – The gap open score; cf. alnscores::gap_open_score. Default is 0.
  • ge_score (float) – The gap extend score; cf. alnscores::gap_extend_score. Default is 0.
  • max_new_mins (int) – Maximum number of tolerated new minima encountered in the running score of an alignment; cf. alnprob::max_new_mins. Default is -1 in which case no such constraint is imposed.
  • diag_range (tuple) – If in BANDED_MODE this argument specifies the upper and lower limit on diagonals of the dynamic programming table to be populated; cf. banded_alnparams.
  • min_score (float) – The minimum required score for an alignment to be reported; default is float("-inf") in which case all alignments are reported.
solve()[source]

Populates the regions of interest in the dynamic programming table and reports the optimal score; if any. This function must be called within the context, cf. __enter__(), __exit__().

Returns:
The score of the optimal alignment or None if none
found.
Return type:score (float)
table_scores()[source]

Returns a 2D array of scores calculated by solve().

traceback()[source]

Traces back the optimal alignment identified by solve(). This function has to be called within the context and after solve(). Otherwise no alignment would be found.

Returns:The optimal alignment or None if none found.
Return type:Alignment
calculate_score(alignment)[source]

Scores a given alignment for origin and mutant. :param alignment: The alignment to be evaluated. :type alignment: Alignment

Returns:The score of the alignment based on subst_scores, go_score, and ge_score.
Return type:float
class biseqt.pw.Alignment(origin, mutant, transcript, score=None, origin_start=0, mutant_start=0)[source]

Bases: object

Represents a pairwise alignment.

origin

sequence.Sequence – The original (“from”) sequence.

mutant

sequence.Sequence – The mutant (“to”) sequence.

alphabet

sequence.Alphabet – The shared alphabet of origin and mutant.

transcript

str – The sequence of edit operations that transforms origin to mutant. The alphabet for edit operations is M for match, S for substitution (mismatch), and I and D for insertion and deletion.

origin_start

int – Starting position on the original sequence; default is 0.

mutant_start

int – Starting position on the mutant sequence; default is 0.

score

float – The score of the alignment; default is None.

classmethod projected_len(transcript, on='origin')[source]

Calculates the projected length of a given transcript on either of the involved sequences. For instance:

>>> biseqt.Alignment.projected_len('MSI', on='origin')
2
>>> biseqt.Alignment.projected_len('MSI', on='mutant')
3
Parameters:transcript (str) – A sequence of edit operations, cf. Alignment.transcript.
Keyword Arguments:
 on (str) – Either of origin or mutant.
Returns:The projected length of the edit transcript.
Return type:int
calculate_score(subst_scores, go_score, ge_score)[source]

Scores a this alignment according to given scoring scheme.

Parameters:
  • subst_scores (list) – The substitution score matrix, cf. Aligner.subst_scores.
  • go_score (float) – The gap open score; cf. Aligner.go_score.
  • ge_score (float) – The gap extend score; cf. Aligner.ge_score.
Returns:

The score of the alignment for origin and mutant based on given scores.

Return type:

float

truncate_to_match()[source]
render_term(term_width=120, margin=0, colored=True)[source]

Renders a textual representation of the alignment.

Keyword Arguments:
 
  • term_width (int) – Terminal width used for wrapping; default is 120 and the smallest valid value is 30.
  • margin (length) – Length of leading and trailing substring to include in original and mutant sequences; default is 20.
  • colored (bool) – Whether or not to use ANSI color codes in output; default is True.
Returns:

str