The feature table contains information about genes and gene products, as well as regions of biological significance reported in the sequence. The feature table contains information on regions of the sequence that code for proteins and RNA molecules. It also enumerates differences between different reports of the same sequence, and provides cross-references to other data collections, as described in more detail below.
The first column of the feature descriptor line contains the feature key. The list of valid feature keys is shown below.
keyword | description |
---|---|
allele | Related strain contains alternative gene form. |
attenuator | Sequence related to transcription termination. |
C_region | Span of the C immunological feature. |
CAAT_signal | `CAAT box' in eukaryotic promoters. |
CDS | Sequence coding for amino acids in protein (includes stop codon). |
conflict | Independent determinations differ. |
D-loop | Displacement loop. |
D-segment | Diversity segment of immunoglobulin heavy chain and T-cell receptor beta-chain. |
enhancer | Cis-acting enhancer of promoter function. |
exon | Region that codes for part of spliced mRNA. |
GC_signal | `GC box' in eukaryotic promoters. |
iDNA | Intervening DNA eliminated by recombination. |
intron | Transcribed region excised by mRNA splicing. |
J-segment | Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta and gamma-chains. |
LTR | Long terminal repeat. |
mat_peptide | Mature peptide coding region (does not include stop codon). |
misc_binding | Miscellaneous binding site. |
misc_difference | Miscellaneous difference feature. |
misc_feature | Region of biological significance that cannot be described by any other feature. |
misc_recomb | Miscellaneous recombination feature. |
misc_RNA | Miscellaneous transcript feature not defined by other RNA keys. |
misc_signal | Miscellaneous signal. |
misc_structure | Miscellaneous DNA or RNA structure. |
modified_base | The indicated base is a modified nucleotide. |
mRNA | Messenger RNA. |
mutation | A mutation alters the sequence here. |
N_region | Span of the N immunological feature. |
old_sequence | Presented sequence revises a previous version. |
polyA_signal | Signal for cleavage & polyadenylation. |
polyA_site | Site at which polyadenine is added to mRNA. |
precursor_RNA | Any RNA species that is not yet the mature RNA product. |
prim_transcript | Primary (unprocessed) transcript. |
primer_bind | Non-covalent primer binding site. |
promoter | A region involved in transcription initiation. |
protein_bind | Non-covalent protein binding site on DNA or RNA. |
RBS | Ribosome binding site. |
rep_origin | Replication origin for duplex DNA. |
repeat_region | Sequence containing repeated subsequences. |
repeat_unit | One repeated unit of a repeat_region. |
rRNA | Ribosomal RNA. |
S_region | Span of the S immunological feature. |
satellite | Satellite repeated sequence. |
scRNA | Small cytoplasmic RNA. |
sig_peptide | Signal peptide coding region. |
snRNA | Small nuclear RNA. |
source | Biological source of the specified span of sequence. |
stem_loop | Hair-pin loop structure in DNA or RNA. |
STS | Sequence Tagged Site; operationally unique sequence that identifies the combination of primer spans used in a PCR assay. |
TATA_signal | `TATA box' in eukaryotic promoters. |
terminator | Sequence causing transcription termination. |
transit_peptide | Transit peptide coding region. |
tRNA | Transfer RNA. |
unsure | Authors are unsure about the sequence in this region. |
V_region | Span of the V immunological feature. |
V_segment | Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta and gamma chains. |
variation | A related population contains stable mutation. |
virion | Virion (encapsidated) viral sequence. |
- (hyphen) | Placeholder. |
-10_signal | `Pribnow box' in prokaryotic promoters. |
-35_signal | `-35 box' in prokaryotic promoters. |
3'clip | 3'-most region of a precursor transcript removed in processing. |
3'UTR | 3' untranslated region (trailer). |
5'clip | 5'-most region of a precursor transcript removed in processing. |
5'UTR | 5' untranslated region (leader). |
The second column of the feature descriptor line
designates the location of the feature in the sequence. Several
conventions are used to indicate sequence location.
Base numbers in location descriptors refer to
numbering in the entry, which is not necessarily the same as the
numbering scheme used in the published report. The first base in
the presented sequence is numbered base 1. Sequences are
presented in the 5 to 3 direction.
Location descriptors can be one of the following:
A site between two residues, such as an
endonuclease cleavage site, is indicated by listing the two bases
separated by a carat (e.g., 23^24).
A single residue chosen from a range of residues is
indicated by the number of the first and last bases in the range
separated by a single period (e.g., 23.79).
The symbols < and > indicate that the end
point of the range is beyond the specified base number.
A contiguous span of bases is indicated by the
number of the first and last bases in the range separated by two
periods (e.g., 23..79).
Operators are prefixes that specify what must be
done to the indicated sequence to locate the feature. The
following are the operators available, along with their most
common format and a description.
Qualifiers provide additional information about
features. They take the form of a slash (/) followed by a
qualifier name and, if applicable, an equal sign (=) and a
qualifier value. Qualifiers convey many types of information.
Therefore, their values can take free text, controlled
vocabulary, enumerated values, reference numbers, sequences, or
feature labels.
The following is a list of valid feature
qualifiers.
qualifier | description |
---|---|
anticodon | Location of the anticodon of tRNA and the amino acid for which it codes |
bound_moiety | Moiety bound |
cell_line | Cell line from which the sequence was obtained |
cell_type | Cell type from which the sequence was obtained |
chromosome | Chromosome from which the sequence was obtained |
chloroplast | Organelle type from which the sequence was obtained |
chromoplast | Organelle type from which the sequence was obtained |
citation | Reference to a citation providing the claim of or evidence for a feature |
clone | Clone from which the sequence was obtained |
clone_lib | Clone library from which the sequence was obtained |
codon | Specifies a codon that is different from any found in the reference genetic code |
codon_start | Indicates the first base of the first complete codon in a CDS (as 1 or 2 or 3) |
cons_splice | Identifies intron splice sites that do not conform to the 5'-GT... AG-3' splice site consensus |
cultivar | variety of plant from which the sequence was obtained |
cyanelle | Organelle type from which the sequence was obtained |
db_xref | Cross-reference to an external database |
dev_stage | Developmental stage of source organism |
direction | Direction of DNA replication |
EC_number | Enzyme Commission number for the enzyme product of the sequence |
evidence | Value indicating the nature of supporting evidence |
frequency | Frequency of the occurrence of a feature |
function | Function attributed to a sequence |
gene | Symbol of the gene corresponding to a sequence region (usable with all features) |
gdb_xref | Genome Databank unique ID cross reference qualifier |
germline | Immunoglobulin unrearranged DNA |
haplotype | Haplotype of organism from which sequence was obtained |
insertion_seq | Insertion sequence element from which sequence was obtained |
isolate | Individual isolate from which sequence was obtained |
kinetoplast | Organelle type from which sequence was obtained |
label | A label used to permanently identify a feature |
lab_host | Laboratory host used to propagate the organism from which sequence was obtained |
map | Map position of the feature in free-format text |
macronuclear | Macronuclear DNA |
mitochondrion | Organelle type from which sequence was obtained |
mod_base | Abbreviation for a modified nucleotide base |
note | Any comment or additional information |
number | A number indicating the order of genetic elements (e.g., exons or introns) in the 5 to 3 direction |
organism | Name of organism if different from that contained in the entry's ORGANISM field |
partial | Differentiates between complete regions and partial ones |
PCR_conditions | PCR reaction conditions and components |
pop_variant | Population variant from which sequence was obtained |
phenotype | Phenotype conferred by the feature |
plasmid | Name of plasmid from which sequence was obtained |
product | Name of a product encoded by the sequence |
proviral | Viral sequence integrated into another organism's genome |
pseudo | Indicates that this feature is a non-functional version of the element named by the feature key |
rearranged | Immunoglobulin rearranged DNA |
rpt_family | Type of repeated sequence; Alu or Kpn, for example |
rpt_type | Organization of repeated sequence |
rpt_unit | Identity of repeat unit that constitutes a repeat_region |
serotype | Serotype from which sequence was obtained |
sex | Sex of organism from which sequence was obtained |
sequenced_mol | Molecule from which sequence was obtained |
specific_host | Natural host from which sequence was obtained |
standard_name | Accepted standard name for this feature |
strain | Strain from which sequence was obtained |
sub_clone | Sub-clone from which sequence was obtained |
sub_species | Sub-species name of organism from which sequence was obtained |
sub_strain | Sub-strain from which sequence was obtained |
tissue_lib | Tissue library from which sequence was obtained |
tissue_type | Tissue type from which sequence was obtained |
translation | Amino acid translation of coding region (automatically generated) |
transl_except | Translational exception: single codon, the translation of which does not conform to the reference genetic code |
transl_table | Genetic code table |
transposon | Transposable element from which sequence was obtained |
type | Name of a strain if different from that in the SOURCE field |
usedin | Indicates that feature is used in a compound feature in another entry |
variety | Variety from which sequence was obtained |
CDS 5..1261 /product="alpha-1-antitrypsin precursor" /map="14q32.1" /gene="PI" tRNA 1..87 /note="Leu-tRNA-CAA (NAR: 1057)" /anticodon=(pos:35..37,aa:Leu) mRNA 1..>66 /note="alpha-1-acid glycoprotein mRNA" transposon <1..267 /note="insertion element IS5" misc_recomb 105^106 /note="B.subtilis DNA end/IS5 DNA start" conflict replace(258..258,"t") /citation="[2]"
LOCUS HUMPGAMM1 3688 bp ds-DNA PRI 15-OCT-1990 DEFINITION Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M) gene, 5' end. ACCESSION M55673 M25818 M27095 KEYWORDS phosphoglycerate mutase. SEGMENT 1 of 2 . . FEATURES Location/Qualifiers CAAT_signal 1751..1755 /gene="PGAM-M" TATA_signal 1791..1799 /gene="PGAM-M" exon 1820..2274 /number=1 /EC_number="5.4.2.1" /gene="PGAM-M" intron 2275..2377 /number=1 /gene="PGAM2" exon 2378..2558 /number=2 /gene="PGAM-M" . . // LOCUS HUMPGAMM2 677 bp ds-DNA PRI 15-OCT-1990 DEFINITION Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M), exon 3. ACCESSION M55674 M25818 M27096 KEYWORDS phosphoglycerate mutase. SEGMENT 2 of 2 . . FEATURES Location/Qualifiers exon 255..457 /number=3 /gene="PGAM-M" intron order(M55673:2559..>3688,