Class to describe the DNA sequence of a chromosome, in a compact manner. More...
#include <CompactDnaSequence.hpp>
Public Member Functions | |
CompactDnaSequence (const std::string &name, bool circular, const void *packedData, const util::Md5Digest &md5, size_t length, const std::vector< AmbiguousRegion > amb) | |
std::string | getSequence (int64_t pos, int64_t length) const |
Return the sequence of IUPAC codes for the chromosome as if by repeatedly calling CompactDnaSequence::getBase(). | |
std::string | getUnambiguousSequence (int64_t pos, int64_t length) const |
Return an unambiguous sequence of base calls as if by repeatedly calling CompactDnaSequence::getUnambiguousBase(). | |
void | appendSequence (std::string &seq, int64_t pos, int64_t length) const |
Append the sequence of IUPAC codes as if by repeatedly calling CompactDnaSequence::getBase(). | |
void | appendUnambiguousSequence (std::string &seq, int64_t pos, int64_t length) const |
Append an unambiguous sequence of base calls as if by repeatedly calling CompactDnaSequence::getUnambiguousBase(). | |
char | getBase (int64_t pos) const |
Get the IUPAC code for this chromosome at position pos. | |
char | getUnambiguousBase (int64_t pos) const |
Return an unambiguous base call for this chromosome at position pos (as by util::BaseUtil::disambiguate(char)) that is consistent with the IUPAC code for the chromosome at this position. | |
size_t | extendLeftBy3Mers (size_t pos, size_t count) const |
Return pos, extended to the left until it has passed by count distinct 3-mers of unambiguous reference sequence. | |
size_t | extendRightBy3Mers (size_t pos, size_t count) const |
Return pos, extended to the right until it has passed by count distinct 3-mers of unambiguous reference sequence. | |
void | validate () const |
Verify that the md5s recorded in the crr file metadata are the same as the md5s produced by re-computing them on the data. | |
const std::string & | getName () const |
Return the name of this chromosome. | |
bool | isCircular () const |
Return whether this chromosome is circular. | |
const util::Md5Digest & | getMd5Digest () const |
Return the md5 digest of the chromosome's sequence. | |
size_t | length () const |
Return the length in bases of the chromosome. | |
const std::vector < AmbiguousRegion > & | getAmbiguousRegions () const |
Return the list of AmbiguousRegion for this chromosome, in order by position. |
Class to describe the DNA sequence of a chromosome, in a compact manner.
Used internally by CrrFile class.
void cgatools::reference::CompactDnaSequence::appendSequence | ( | std::string & | seq, | |
int64_t | pos, | |||
int64_t | length | |||
) | const |
Append the sequence of IUPAC codes as if by repeatedly calling CompactDnaSequence::getBase().
void cgatools::reference::CompactDnaSequence::appendUnambiguousSequence | ( | std::string & | seq, | |
int64_t | pos, | |||
int64_t | length | |||
) | const |
Append an unambiguous sequence of base calls as if by repeatedly calling CompactDnaSequence::getUnambiguousBase().
size_t cgatools::reference::CompactDnaSequence::extendLeftBy3Mers | ( | size_t | pos, | |
size_t | count | |||
) | const |
Return pos, extended to the left until it has passed by count distinct 3-mers of unambiguous reference sequence.
This function stops at the chromosome end, even for circular chromosomes.
size_t cgatools::reference::CompactDnaSequence::extendRightBy3Mers | ( | size_t | pos, | |
size_t | count | |||
) | const |
Return pos, extended to the right until it has passed by count distinct 3-mers of unambiguous reference sequence.
This function stops at the chromosome end, even for circular chromosomes.
const std::vector<AmbiguousRegion>& cgatools::reference::CompactDnaSequence::getAmbiguousRegions | ( | ) | const [inline] |
Return the list of AmbiguousRegion for this chromosome, in order by position.
char cgatools::reference::CompactDnaSequence::getBase | ( | int64_t | pos | ) | const |
Get the IUPAC code for this chromosome at position pos.
For circular chromosomes, pos is allowed to range from -length to 2*length-1.
const util::Md5Digest& cgatools::reference::CompactDnaSequence::getMd5Digest | ( | ) | const [inline] |
Return the md5 digest of the chromosome's sequence.
In particular, this is the md5 of the IUPAC codes of the chromosome, converted to upper case.
std::string cgatools::reference::CompactDnaSequence::getSequence | ( | int64_t | pos, | |
int64_t | length | |||
) | const |
Return the sequence of IUPAC codes for the chromosome as if by repeatedly calling CompactDnaSequence::getBase().
char cgatools::reference::CompactDnaSequence::getUnambiguousBase | ( | int64_t | pos | ) | const |
Return an unambiguous base call for this chromosome at position pos (as by util::BaseUtil::disambiguate(char)) that is consistent with the IUPAC code for the chromosome at this position.
For circular chromosomes, pos is allowed to range from -length to 2*length-1.
std::string cgatools::reference::CompactDnaSequence::getUnambiguousSequence | ( | int64_t | pos, | |
int64_t | length | |||
) | const |
Return an unambiguous sequence of base calls as if by repeatedly calling CompactDnaSequence::getUnambiguousBase().
void cgatools::reference::CompactDnaSequence::validate | ( | ) | const |
Verify that the md5s recorded in the crr file metadata are the same as the md5s produced by re-computing them on the data.