biseqt.sequence module¶

class biseqt.sequence.Alphabet(letters)[source]¶

Bases: object

A sequence alphabet.

_letters¶: tuple – The letters in the alphabet. All getitem operations (i.e indexing and slicing) are delegated to this tuple. This attribute should be considered read-only.

_letlen¶: int – The length of the letters in the alphabet when represented as a string. This attribute should be considered read-only.

Parameters:	letters (iterable) – The elements of this iterable must be hashable, i.e can be keys of a dictionary, and must respond to `len()`. Typically, they are single character strings.

letter_to_idx(letters)[source]¶

Translates provided letters to the integer sequence corresponding to the index of each letter in this alphabet.

Parameters:	letters (iterable) – The letters to be translated to integer indices. Each element retrieved through iteration should be an element in `_letters`.
Returns:	tuple

parse(string)[source]¶

Given a string representation of a sequence returns a corresponding Sequence object.

Parameters:	string (str) – The raw sequence represented as a string.
Returns:	Sequence

transform(seq, mappings={})[source]¶

Transforms the given sequence to another sequence in the same alphabet according to provided letter-to-letter mappings.

Parameters:

seq (Sequence) – The original sequence.
mappings (list|dict) – If a dictionary is given, each entry represents a translation rule from the key to the value. If a list is given, each entry must have two elements and is taken to represent a bidirectional translation rule between those two elements. Each element in either a dictionary or a list can either be a letter in string format or an integer representing the position of the letter.

Returns:

Sequence

For example, to get the complement of a DNA sequence:

>>> from biseqt.sequence import Alphabet, complement
>>> A = Alphabet('ACGT')
>>> S = A.parse('AGGGT')
>>> print A.transform(S, mappings=['AT', 'CG'])
'TCCCA'

whereas to get the same effect with a dictionary:

>>> mappings = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
>>> print A.complement(S, mappings)
'TCCCA'

class biseqt.sequence.Sequence(alphabet, contents=())[source]¶

Bases: object

An immutable sequence of letters from some Alphabet which behaves mostly like a tuple.

alphabet¶: Alphabet – The Alphabet of the sequence.

contents¶: tuple – The contents of the sequence represented as tuple of integers of the same length where each letter is represented by its position in the alphabet.

content_id¶: string – Hex representation of the sequence SHA1.

Initializes the sequence object: translates all letters to integers corresponding to the position of each letter in the alphabet.

Parameters:	alphabet (Alphabet) – The `Alphabet` of the sequence. contents (iterable) – The contents of the sequence as an iterable, each element of which is the integer representation of a letter from the `Alphabet`; default is an empty sequence. If the alphabet letter length is one, this argument can be a string.

reverse()[source]¶

Returns another sequence whose contents are the reverse of this sequence in order.

Returns:	Sequence

transform(mappings={})[source]¶: Wraps Alphabet.transform() for convenience.