pwlib (C component)¶
Documentation for data structrs, constants, and functionsdefined in the C component.
Constants¶
-
enum
alnmode
¶ Indicates which of the pairwise alignment algorithms should be used, see dptable.
There are two available variations of the standard dynamic programming algorithm.
First, the “standard” mode, in which time and space are quadratic in sequence lengths. Second, the “banded” mode, in which both time and space complexity are linear in sequence lengths (in fact, the shorter of the sequence lengths). This mode is not well-posed for local alignments.
Values:
-
STD_MODE
¶ Standard alignment algorithm.
-
BANDED_MODE
¶ Banded alignment algorithm.
-
-
enum
std_alntype
¶ Defines the boundary conditions (i.e where an alignment can start and where it can end) on the dynamic programming table for standard alignments.
Values:
-
GLOBAL
¶ Solve for optimal global alignments, equivalent to the Needleman-Wunsch algorithm.
-
LOCAL
¶ Solve for optimal local alignments, equivalent to the Smith-Waterman algorithm.
-
START_ANCHORED
¶ Solve for optimal local alignments starting at the start of alignment frame for both sequences.
-
END_ANCHORED
¶ Solve for optimal local alignments ending at the end of alignment frame for both sequences.
-
OVERLAP
¶ Solve for suffix-prefix alignments of either direction (including substring alignments).
-
START_ANCHORED_OVERLAP
¶ Solve or suffix-prefix alignments starting at the start of alignment frame for both sequences.
-
END_ANCHORED_OVERLAP
¶ Solve or suffix-prefix alignments ending at the end of alignment frame for both sequences.
-
-
enum
banded_alntype
¶ Defines the boundary conditions (i.e where an alignment can start and where it can end) on the dynamic programming table for banded alignments.
Values:
-
B_GLOBAL
¶ Solve for banded global alignments.
-
B_LOCAL
¶ Solve for banded local alignments.
-
B_OVERLAP
¶ Solve for banded suffix-prefix alignments of either direction (including substring alignments, if within band).
-
Functions¶
-
int
dptable_init
(dptable * T)¶ Given a dptable containing the alignment problem definition and the mode of alignment, the table’s dimensions are calculated and the appropriate amount of memory is allocated with initialized dpcell entries.
In banded mode the region of interest, which is not rectangular in the default (x,y) coordinate system, is mapped to an alternative coordinate system (refered to as (d,a)-coordinates).
- Return
- 0 if successful and -1 if an error occurs.
- Parameters
T
: Table to be initialized. The only fields that must be already populated are the alignment mode and the problem definition.
-
void
dptable_free
(dptable * T)¶ Frees the allocated memory for the cells of a given dptable.
- Parameters
T
: the dynamic programming table containing all the info.
-
alignment*
dptable_traceback
(dptable * T, intpair end)¶ Traces back an alignment (and not all alignments with identical scores) from a given cell all the way back to a cell with a
NULL
base (i.e an alignment start cell).- Note
- Finding more than one optimal alignment is a nontrivial search problem, and requires some sort of global state keeping to avoid convoluted recursions. I couldn’t get it right in the first go; leave for later.
- Parameters
T
: The solved dynamic programming table.end
: The desired ending point of alignment which becomes the starting point of traceback.
-
intpair
dptable_solve
(dptable * T)¶ Given an alignment with allocated DP table, solves the alignment problem (i.e populates the alignment table) and returns the optimal ending point for the alignment.
The optimal score and edit transcript can then be obtained by using
dptable_traceback
. Half of the constraints imposed by an alignment type are implemented here where we decide what positions on the table can be starting positions of alignments. The other half of the constraints concerning the ending position of the alignment on the table are enforced indptable_traceback
.- Return
- The optimal cell for the alignment to end at or {-1,-1} if error.
- Parameters
T
: The dynamic programming table.
-
void*
safe_malloc
(int n, char * file, int line)¶ Safe handler for memory allocation.
If memory cannot be allocated prints an error message with coordinates of caller and crashes.
Data Structures¶
-
struct
intpair
¶ Any pair of integers.
-
struct
alnscores
¶ Groups together the parameters for a pairwise alignment algorithm used to solve an alignment problem.
Public Members
-
double**
subst_scores
¶ The substitution score matrix, ordered the same as letters in the alphabet.
-
double
gap_open_score
¶ The gap open score in the affine gap penalty scheme, use 0 for a linear gap model.
-
double
gap_extend_score
¶ The gap extension score, or the linear gap score when using a linear gap model.
-
double**
-
struct
alnframe
¶ Defines the skeleton of a pairwise alignment problem: two sequences and their start/end of frame positions.
-
struct
std_alnparams
¶ Groups together all parameters of a standard pairwise alignment problem: for now only an alignment type.
Public Members
-
std_alntype
type
¶ The subtype of standard algorithms.
-
std_alntype
-
struct
banded_alnparams
¶ Groups together all parameters of a banded pairwise alignment problem: alignment type, scores, and bounding diagonals.
Public Members
-
banded_alntype
type
¶ The subtype of banded algorithms.
-
int
dmin
¶ The upper diagonal bounding the band.
-
int
dmax
¶ The lower diagonal bounding the band.
-
banded_alntype
-
struct
alnprob
¶ The definition of a pairwise alignment problem the contents of which is enough to define and solve an alignment.
Public Members
-
int
max_new_mins
¶ Maximum number of times a new minimum score is tolerated before alignment is aborted (only operative if positive).
-
std_alnparams*
std_params
¶ The parameters for standard algorithms.
-
banded_alnparams*
banded_params
¶ The parameters for banded algorithms.
-
int
-
struct
alnchoice
¶ The fundamental unit of solving pairwise alignment problems.
Each cell of the dynamic programming table corrsponds to a subproblem of the overall alignment problem. For each subproblem, a “choice” corresponds to a single edit operation (which applies to the very last positions of the subproblem) and a “base” which is a choice residing at the cell to which the op points (the “base” is
NULL
for the choice of starting an alignment at a given position).Each choice can thus be traced all the way back via the chain of “base” pointers to the beginning of the alignment to which it belongs. It is the total score of this alignment that is stored at each cell for each choice.
This architecture allows us to easily capture path-dependent scoring schemes, for example, the affine gap penalty.
Public Members
-
char
op
¶ Either of
M
for match,origin
for substitution,I
for insertion, andD
for deletion.
-
double
score
¶ The overall score of the alignment corresponding to the chain of bases leading to here.
-
struct alnchoice*
base
¶ The choice for the consequent subproblem on which this choice and its score depends.
-
int
mins_cd
¶ The countdown of the number of new score minima encountered along the path leading to this choice.
-
int
cur_min
¶ The current minimum score along the path leading to this choice.
-
char
-
struct
dpcell
¶ A single cell of the dynamic programming table.
Each cell has an array of optimal choices linking to choices in dependent cells.
-
struct
dptable
¶ The full dynamic programming table for a standard or banded alignment.
-
struct
alignment
¶ Represents a solution to an alignment problem.