biseqt.sequence module

class biseqt.sequence.Alphabet(letters)[source]

Bases: object

A sequence alphabet.

_letters

tuple – The letters in the alphabet. All getitem operations (i.e indexing and slicing) are delegated to this tuple. This attribute should be considered read-only.

_letlen

int – The length of the letters in the alphabet when represented as a string. This attribute should be considered read-only.

Parameters:letters (iterable) – The elements of this iterable must be hashable, i.e can be keys of a dictionary, and must respond to len(). Typically, they are single character strings.
letter_to_idx(letters)[source]

Translates provided letters to the integer sequence corresponding to the index of each letter in this alphabet.

Parameters:letters (iterable) – The letters to be translated to integer indices. Each element retrieved through iteration should be an element in _letters.
Returns:tuple
parse(string)[source]

Given a string representation of a sequence returns a corresponding Sequence object.

Parameters:string (str) – The raw sequence represented as a string.
Returns:Sequence
transform(seq, mappings={})[source]

Transforms the given sequence to another sequence in the same alphabet according to provided letter-to-letter mappings.

Parameters:
  • seq (Sequence) – The original sequence.
  • mappings (list|dict) – If a dictionary is given, each entry represents a translation rule from the key to the value. If a list is given, each entry must have two elements and is taken to represent a bidirectional translation rule between those two elements. Each element in either a dictionary or a list can either be a letter in string format or an integer representing the position of the letter.
Returns:

Sequence

For example, to get the complement of a DNA sequence:

>>> from biseqt.sequence import Alphabet, complement
>>> A = Alphabet('ACGT')
>>> S = A.parse('AGGGT')
>>> print A.transform(S, mappings=['AT', 'CG'])
'TCCCA'

whereas to get the same effect with a dictionary:

>>> mappings = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
>>> print A.complement(S, mappings)
'TCCCA'
class biseqt.sequence.Sequence(alphabet, contents=())[source]

Bases: object

An immutable sequence of letters from some Alphabet which behaves mostly like a tuple.

alphabet

Alphabet – The Alphabet of the sequence.

contents

tuple – The contents of the sequence represented as tuple of integers of the same length where each letter is represented by its position in the alphabet.

content_id

string – Hex representation of the sequence SHA1.

Initializes the sequence object: translates all letters to integers corresponding to the position of each letter in the alphabet.

Parameters:
  • alphabet (Alphabet) – The Alphabet of the sequence.
  • contents (iterable) – The contents of the sequence as an iterable, each element of which is the integer representation of a letter from the Alphabet; default is an empty sequence. If the alphabet letter length is one, this argument can be a string.
reverse()[source]

Returns another sequence whose contents are the reverse of this sequence in order.

Returns:Sequence
transform(mappings={})[source]

Wraps Alphabet.transform() for convenience.