biseqt.pw module¶

biseqt.pw.lib¶: The loaded shared object for pwlib. All functions defined in the header file are accessible through this object. This object is automatically populated upon loading this module (cf. setup_ffi()) and users never have to manipulate it.

biseqt.pw.ffi¶: The main FFI instance used throughout this module. This object is automatically populated upon loading this module (cf. setup_ffi()) and users never have to manipulate it.

biseqt.pw.setup_ffi()[source]¶: Instantiates an FFI object as ffi and loads the shared object for pwlib into lib. This function is automatically called when this module loads.

Note

CFFI has issues with loading macros as they are defined in a header file. For this reason, and since we don’t use the macros in python code any line that begins with #define is ignored from the header file. This means multiline macros will not work.

biseqt.pw.STD_MODE¶: Standard alignment type; time and memory complexity is quadratic in sequence lengths.

biseqt.pw.BANDED_MODE¶: Banded alignment type; time and memory complexity is linear in sequence lengths with a constant proportional to band width. This mode is incompatible with local alignments.

biseqt.pw.GLOBAL¶: Standard global alignment problem, i.e Needleman-Wunsch.

biseqt.pw.LOCAL¶: Standard local alignment problem, i.e Smith-Waterman.

biseqt.pw.START_ANCHORED¶: Standard local alignment demanding that it begins at the start of frame of both sequences.

biseqt.pw.END_ANCHORED¶: Standard local alignment demanding that it ends at the end of frame of both sequences.

biseqt.pw.OVERLAP¶: Standard suffix-prefix alignment in any direction; this includes alignments where a prefix of either sequence matches a suffix of the other and alignments where one sequence is a substring of the other.

biseqt.pw.START_ANCHORED_OVERLAP¶: Standard suffix-prefix alignment demanding that it begins at the start of frame of both sequences.

biseqt.pw.END_ANCHORED_OVERLAP¶: Standard suffix-prefix alignment demanding that it ends at the end of frame of both sequences.

biseqt.pw.B_GLOBAL¶: Banded global alignment problem; may not be well-defined (end points of the table may not lie in band).

biseqt.pw.B_OVERLAP¶: Banded suffix-prefix alignment problem in either direction including substring alignments.

biseqt.pw.B_LOCAL¶: Banded local alignment problem.

class biseqt.pw.Aligner(origin, mutant, **kw)[source]¶

Bases: object

Provides a context that solves a pairwise alignment problem. Memory is allocated upon entering the context and is freed upon leaving it. All alignment calculations (solve() and traceback()) are explicitly invoked by the caller.

Parameters:

origin (sequence.Sequence) – The original (“from”) sequence.
mutant (sequence.Sequence) – The mutant (“to”) sequence.

Keyword Arguments:

origin_range (tuple) – The original (“from”) sequence; cf. alnframe::origin_range.
mutant_range (tuple) – The mutant (“to”) sequence; cf. alnframe::mutant_range.
alnmode (int) – One of the STD_MODE or BANDED_MODE, default is STD_MODE; cf. alnprob::mode.
alntype (int) – One of the allowed alingment types for the given alnmode, see ALN_TYPES; default is GLOBAL; cf. std_alnparams and banded_alnparams.
subst_scores (list) – The overriding definition of the substitution score matrix; cf. alnscores::subst_scores. Default is None in which case the score matrix is populated based on match and mismatch scores.
match_score (float) – If subst_scores is not given, this parameter is used to populate the diagonal entries of the substitution score matrix; default is 1.
mismatch_score (float) – If subst_scores is not given, this parameter is used to populate the off-diagonal entries of the substitution score matrix; default is 0.
go_score (float) – The gap open score; cf. alnscores::gap_open_score. Default is 0.
ge_score (float) – The gap extend score; cf. alnscores::gap_extend_score. Default is 0.
max_new_mins (int) – Maximum number of tolerated new minima encountered in the running score of an alignment; cf. alnprob::max_new_mins. Default is -1 in which case no such constraint is imposed.
diag_range (tuple) – If in BANDED_MODE this argument specifies the upper and lower limit on diagonals of the dynamic programming table to be populated; cf. banded_alnparams.
min_score (float) – The minimum required score for an alignment to be reported; default is float("-inf") in which case all alignments are reported.

solve()[source]¶

Populates the regions of interest in the dynamic programming table and reports the optimal score; if any. This function must be called within the context, cf. __enter__(), __exit__().

Returns:	The score of the optimal alignment or None if none found.
Return type:	score (float)

table_scores()[source]¶: Returns a 2D array of scores calculated by solve().

traceback()[source]¶

Traces back the optimal alignment identified by solve(). This function has to be called within the context and after solve(). Otherwise no alignment would be found.

Returns:	The optimal alignment or None if none found.
Return type:	Alignment

calculate_score(alignment)[source]¶

Scores a given alignment for origin and mutant. :param alignment: The alignment to be evaluated. :type alignment: Alignment

Returns:	The score of the alignment based on `subst_scores`, `go_score`, and `ge_score`.
Return type:	float

class biseqt.pw.Alignment(origin, mutant, transcript, score=None, origin_start=0, mutant_start=0)[source]¶

Bases: object

Represents a pairwise alignment.

origin¶: sequence.Sequence – The original (“from”) sequence.

mutant¶: sequence.Sequence – The mutant (“to”) sequence.

alphabet¶: sequence.Alphabet – The shared alphabet of origin and mutant.

transcript¶: str – The sequence of edit operations that transforms origin to mutant. The alphabet for edit operations is M for match, S for substitution (mismatch), and I and D for insertion and deletion.

origin_start¶: int – Starting position on the original sequence; default is 0.

mutant_start¶: int – Starting position on the mutant sequence; default is 0.

score¶: float – The score of the alignment; default is None.

classmethod projected_len(transcript, on='origin')[source]¶

Calculates the projected length of a given transcript on either of the involved sequences. For instance:

>>> biseqt.Alignment.projected_len('MSI', on='origin')
2
>>> biseqt.Alignment.projected_len('MSI', on='mutant')
3

Keyword Arguments:
Parameters:	transcript (str) – A sequence of edit operations, cf. `Alignment.transcript`.
	on (str) – Either of `origin` or `mutant`.
Returns:	The projected length of the edit transcript.
Return type:	int

calculate_score(subst_scores, go_score, ge_score)[source]¶

Scores a this alignment according to given scoring scheme.

Parameters:	subst_scores (list) – The substitution score matrix, cf. `Aligner.subst_scores`. go_score (float) – The gap open score; cf. `Aligner.go_score`. ge_score (float) – The gap extend score; cf. `Aligner.ge_score`.
Returns:	The score of the alignment for `origin` and `mutant` based on given scores.
Return type:	float

truncate_to_match()[source]¶

render_term(term_width=120, margin=0, colored=True)[source]¶

Renders a textual representation of the alignment.

Keyword Arguments:
	term_width (int) – Terminal width used for wrapping; default is 120 and the smallest valid value is 30. margin (length) – Length of leading and trailing substring to include in original and mutant sequences; default is 20. colored (bool) – Whether or not to use ANSI color codes in output; default is True.
Returns:	str