| makeTranscriptDb {GenomicFeatures} | R Documentation |
makeTranscriptDb is a low-level constructor for making
a TranscriptDb object from user supplied transcript annotations.
See ?makeTranscriptDbFromUCSC and
?makeTranscriptDbFromBiomart for higher-level
functions that feed data from the UCSC or BioMart sources
to makeTranscriptDb.
makeTranscriptDb(transcripts, splicings,
genes=NULL, chrominfo=NULL, metadata=NULL, ...)
transcripts |
data frame containing the genomic locations of a set of transcripts |
splicings |
data frame containing the exon and cds locations of a set of transcripts |
genes |
data frame containing the genes associated to a set of transcripts |
chrominfo |
data frame containing information about the chromosomes hosting the set of transcripts |
metadata |
2-column data frame containing meta information about
this set of transcripts like species, organism, genome, UCSC table, etc...
The names of the columns must be |
... |
ignored for now |
The transcripts (required), splicings (required)
and genes (optional) arguments must be data frames that
describe a set of transcripts and the genomic features related
to them (exons, cds and genes at the moment).
The chrominfo (optional) argument must be a data frame
containing chromosome information like the length of each chromosome.
transcripts must have 1 row per transcript and the following
columns:
tx_id: Transcript ID. Integer vector. No NAs. No duplicates.
tx_name: [optional] Transcript name. Character vector (or factor).
tx_chrom: Transcript chromosome. Character vector (or factor)
with no NAs.
tx_strand: Transcript strand. Character vector (or factor)
where each element is either "+" or "-".
tx_start, tx_end: Transcript start and end.
Integer vectors with no NAs.
Other columns, if any, are ignored (with a warning).
splicings must have N rows per transcript, where N is the nb
of exons in the transcript. Each row describes an exon plus eventually
the cds contained in this exon. Its columns must be:
tx_id: Foreign key that links each row in the splicings
data frame to a unique row in the transcripts data frame.
Note that more than 1 row in splicings can be linked to the
same row in transcripts (many-to-one relationship).
Same type as transcripts$tx_id (integer vector). No NAs.
All the values in this column must be present in
transcripts$tx_id.
exon_rank: The rank of the exon in the transcript.
Integer vector with no NAs. (tx_id, exon_rank)
pairs must be unique.
exon_id: [optional] Exon ID.
Integer vector with no NAs.
exon_name: [optional] Exon name.
Character vector (or factor).
exon_chrom: [optional] Exon chromosome.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_chrom is used.
If present then exon_strand must be present too.
exon_strand: [optional] Exon strand.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_strand is used
and exon_chrom must be missing too.
exon_start, exon_end: Exon start and end.
Integer vectors with no NAs.
cds_id: [optional] cds ID. Integer vector.
If present then cds_start and cds_end must be too.
NAs are allowed and must match NAs in cds_start
and cds_end.
cds_name: [optional] cds name. Character vector (or factor).
If present then cds_start and cds_end must be too.
NAs are allowed and must match NAs in cds_start
and cds_end.
cds_start, cds_end: [optional] cds start and end.
Integer vectors.
If one of the 2 columns is missing then all cds_* columns
must be missing.
NAs are allowed and must occur at the same positions in
cds_start and cds_end.
Other columns, if any, are ignored (with a warning).
genes must have N rows per transcript, where N is the nb
of genes linked to the transcript (N will be 1 most of the time).
Its columns must be:
tx_id: [optional] genes must have either a
tx_id or a tx_name column but not both.
Like splicings$tx_id, this is a foreign key that
links each row in the genes data frame to a unique
row in the transcripts data frame.
tx_name: [optional]
Can be used as an alternative to the genes$tx_id
foreign key.
gene_id: Gene ID. Character vector (or factor). No NAs.
Other columns, if any, are ignored (with a warning).
chrominfo must have 1 row per chromosome and the following
columns:
chrom: Chromosome name.
Character vector (or factor) with no NAs.
length: Chromosome length.
Either all NAs or an integer vector with no NAs.
is_circular: [optional] Chromosome circularity flag.
Either all NAs or a logical vector with no NAs.
Other columns, if any, are ignored (with a warning).
A TranscriptDb object.
H. Pages
TranscriptDb,
makeTranscriptDbFromUCSC,
makeTranscriptDbFromBiomart
transcripts <- data.frame(
tx_id=1:3,
tx_chrom="chr1",
tx_strand=c("-", "+", "+"),
tx_start=c(1, 2001, 2001),
tx_end=c(999, 2199, 2199))
splicings <- data.frame(
tx_id=c(1L, 2L, 2L, 2L, 3L, 3L),
exon_rank=c(1, 1, 2, 3, 1, 2),
exon_start=c(1, 2001, 2101, 2131, 2001, 2131),
exon_end=c(999, 2085, 2144, 2199, 2085, 2199),
cds_start=c(1, 2022, 2101, 2131, NA, NA),
cds_end=c(999, 2085, 2144, 2193, NA, NA))
txdb <- makeTranscriptDb(transcripts, splicings)