Feature Table Information


   The feature table contains information about genes and gene products, as well as regions of biological significance reported in the sequence. The feature table contains information on regions of the sequence that code for proteins and RNA molecules. It also enumerates differences between different reports of the same sequence, and provides cross-references to other data collections, as described in more detail below.

Feature Keys

   The first column of the feature descriptor line contains the feature key. The list of valid feature keys is shown below.

keyword description
allele Related strain contains alternative gene form.
attenuator Sequence related to transcription termination.
C_region Span of the C immunological feature.
CAAT_signal `CAAT box' in eukaryotic promoters.
CDS Sequence coding for amino acids in protein (includes stop codon).
conflict Independent determinations differ.
D-loop Displacement loop.
D-segment Diversity segment of immunoglobulin heavy chain and T-cell receptor beta-chain.
enhancer Cis-acting enhancer of promoter function.
exon Region that codes for part of spliced mRNA.
GC_signal `GC box' in eukaryotic promoters.
iDNA Intervening DNA eliminated by recombination.
intron Transcribed region excised by mRNA splicing.
J-segment Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta and gamma-chains.
LTR Long terminal repeat.
mat_peptide Mature peptide coding region (does not include stop codon).
misc_binding Miscellaneous binding site.
misc_difference Miscellaneous difference feature.
misc_feature Region of biological significance that cannot be described by any other feature.
misc_recomb Miscellaneous recombination feature.
misc_RNA Miscellaneous transcript feature not defined by other RNA keys.
misc_signal Miscellaneous signal.
misc_structure Miscellaneous DNA or RNA structure.
modified_base The indicated base is a modified nucleotide.
mRNA Messenger RNA.
mutation A mutation alters the sequence here.
N_region Span of the N immunological feature.
old_sequence Presented sequence revises a previous version.
polyA_signal Signal for cleavage & polyadenylation.
polyA_site Site at which polyadenine is added to mRNA.
precursor_RNA Any RNA species that is not yet the mature RNA product.
prim_transcript Primary (unprocessed) transcript.
primer_bind Non-covalent primer binding site.
promoter A region involved in transcription initiation.
protein_bind Non-covalent protein binding site on DNA or RNA.
RBS Ribosome binding site.
rep_origin Replication origin for duplex DNA.
repeat_region Sequence containing repeated subsequences.
repeat_unit One repeated unit of a repeat_region.
rRNA Ribosomal RNA.
S_region Span of the S immunological feature.
satellite Satellite repeated sequence.
scRNA Small cytoplasmic RNA.
sig_peptide Signal peptide coding region.
snRNA Small nuclear RNA.
source Biological source of the specified span of sequence.
stem_loop Hair-pin loop structure in DNA or RNA.
STS Sequence Tagged Site; operationally unique sequence that identifies the combination of primer spans used in a PCR assay.
TATA_signal `TATA box' in eukaryotic promoters.
terminator Sequence causing transcription termination.
transit_peptide Transit peptide coding region.
tRNA Transfer RNA.
unsure Authors are unsure about the sequence in this region.
V_region Span of the V immunological feature.
V_segment Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta and gamma chains.
variation A related population contains stable mutation.
virion Virion (encapsidated) viral sequence.
- (hyphen) Placeholder.
-10_signal `Pribnow box' in prokaryotic promoters.
-35_signal `-35 box' in prokaryotic promoters.
3'clip 3'-most region of a precursor transcript removed in processing.
3'UTR 3' untranslated region (trailer).
5'clip 5'-most region of a precursor transcript removed in processing.
5'UTR 5' untranslated region (leader).

Feature Location

   The second column of the feature descriptor line designates the location of the feature in the sequence. Several conventions are used to indicate sequence location.
    Base numbers in location descriptors refer to numbering in the entry, which is not necessarily the same as the numbering scheme used in the published report. The first base in the presented sequence is numbered base 1. Sequences are presented in the 5 to 3 direction.
    Location descriptors can be one of the following:

  1. A single base
  2. A contiguous span of bases
  3. A site between two bases
  4. A single base chosen from a range of bases
  5. A single base chosen from among two or more specified bases
  6. A joining of sequence spans
  7. A reference to an entry other than the one to which the feature belongs (i.e., a remote entry), followed by a location descriptor referring to the remote sequence
  8. A literal sequence (a string of bases enclosed in quotation marks)

   A site between two residues, such as an endonuclease cleavage site, is indicated by listing the two bases separated by a carat (e.g., 23^24).
    A single residue chosen from a range of residues is indicated by the number of the first and last bases in the range separated by a single period (e.g., 23.79).
    The symbols < and > indicate that the end point of the range is beyond the specified base number.
    A contiguous span of bases is indicated by the number of the first and last bases in the range separated by two periods (e.g., 23..79).
    Operators are prefixes that specify what must be done to the indicated sequence to locate the feature. The following are the operators available, along with their most common format and a description.

complement (location)
The feature is complementary to the location indicated. Complementary strands are read 5 to 3.
join (location, location, .. location)
The indicated elements should be placed end to end to form one contiguous sequence.
order (location, location, .. location)
The elements are found in the specified order in the 5 to 3 direction, but nothing is implied about the rationality of joining them.
group (location, location, .. location)
The elements are related and should be grouped together, but no order is implied.
one-of (location, location, .. location)
The element can be any one, but only one, of the items listed.
replace (location, location)
The first location indicated should be replaced by the sequence from the second location; used for insertions, deletions, and variants.

Feature Qualifiers

   Qualifiers provide additional information about features. They take the form of a slash (/) followed by a qualifier name and, if applicable, an equal sign (=) and a qualifier value. Qualifiers convey many types of information. Therefore, their values can take free text, controlled vocabulary, enumerated values, reference numbers, sequences, or feature labels.
    The following is a list of valid feature qualifiers.

qualifier description
anticodon Location of the anticodon of tRNA and the amino acid for which it codes
bound_moiety Moiety bound
cell_line Cell line from which the sequence was obtained
cell_type Cell type from which the sequence was obtained
chromosome Chromosome from which the sequence was obtained
chloroplast Organelle type from which the sequence was obtained
chromoplast Organelle type from which the sequence was obtained
citation Reference to a citation providing the claim of or evidence for a feature
clone Clone from which the sequence was obtained
clone_lib Clone library from which the sequence was obtained
codon Specifies a codon that is different from any found in the reference genetic code
codon_start Indicates the first base of the first complete codon in a CDS (as 1 or 2 or 3)
cons_splice Identifies intron splice sites that do not conform to the 5'-GT... AG-3' splice site consensus
cultivar variety of plant from which the sequence was obtained
cyanelle Organelle type from which the sequence was obtained
db_xref Cross-reference to an external database
dev_stage Developmental stage of source organism
direction Direction of DNA replication
EC_number Enzyme Commission number for the enzyme product of the sequence
evidence Value indicating the nature of supporting evidence
frequency Frequency of the occurrence of a feature
function Function attributed to a sequence
gene Symbol of the gene corresponding to a sequence region (usable with all features)
gdb_xref Genome Databank unique ID cross reference qualifier
germline Immunoglobulin unrearranged DNA
haplotype Haplotype of organism from which sequence was obtained
insertion_seq Insertion sequence element from which sequence was obtained
isolate Individual isolate from which sequence was obtained
kinetoplast Organelle type from which sequence was obtained
label A label used to permanently identify a feature
lab_host Laboratory host used to propagate the organism from which sequence was obtained
map Map position of the feature in free-format text
macronuclear Macronuclear DNA
mitochondrion Organelle type from which sequence was obtained
mod_base Abbreviation for a modified nucleotide base
note Any comment or additional information
number A number indicating the order of genetic elements (e.g., exons or introns) in the 5 to 3 direction
organism Name of organism if different from that contained in the entry's ORGANISM field
partial Differentiates between complete regions and partial ones
PCR_conditions PCR reaction conditions and components
pop_variant Population variant from which sequence was obtained
phenotype Phenotype conferred by the feature
plasmid Name of plasmid from which sequence was obtained
product Name of a product encoded by the sequence
proviral Viral sequence integrated into another organism's genome
pseudo Indicates that this feature is a non-functional version of the element named by the feature key
rearranged Immunoglobulin rearranged DNA
rpt_family Type of repeated sequence; Alu or Kpn, for example
rpt_type Organization of repeated sequence
rpt_unit Identity of repeat unit that constitutes a repeat_region
serotype Serotype from which sequence was obtained
sex Sex of organism from which sequence was obtained
sequenced_mol Molecule from which sequence was obtained
specific_host Natural host from which sequence was obtained
standard_name Accepted standard name for this feature
strain Strain from which sequence was obtained
sub_clone Sub-clone from which sequence was obtained
sub_species Sub-species name of organism from which sequence was obtained
sub_strain Sub-strain from which sequence was obtained
tissue_lib Tissue library from which sequence was obtained
tissue_type Tissue type from which sequence was obtained
translation Amino acid translation of coding region (automatically generated)
transl_except Translational exception: single codon, the translation of which does not conform to the reference genetic code
transl_table Genetic code table
transposon Transposable element from which sequence was obtained
type Name of a strain if different from that in the SOURCE field
usedin Indicates that feature is used in a compound feature in another entry
variety Variety from which sequence was obtained

Feature Table Example 1


     CDS             5..1261

                     /product="alpha-1-antitrypsin precursor"

                     /map="14q32.1"

                     /gene="PI"

     tRNA            1..87

                     /note="Leu-tRNA-CAA (NAR: 1057)"

                     /anticodon=(pos:35..37,aa:Leu)

     mRNA            1..>66

                     /note="alpha-1-acid glycoprotein mRNA"

     transposon      <1..267 /note="insertion element IS5" misc_recomb 105^106 /note="B.subtilis DNA end/IS5 DNA start" conflict replace(258..258,"t") /citation="[2]" 

Feature Table Example 2 -- joining entries


LOCUS       HUMPGAMM1    3688 bp ds-DNA             PRI       15-OCT-1990

DEFINITION  Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M)

            gene, 5' end.

ACCESSION   M55673 M25818 M27095

KEYWORDS    phosphoglycerate mutase.

SEGMENT     1 of 2

  .

  .

FEATURES             Location/Qualifiers

     CAAT_signal     1751..1755

                     /gene="PGAM-M"

     TATA_signal     1791..1799

                     /gene="PGAM-M"

     exon            1820..2274

                     /number=1

                     /EC_number="5.4.2.1"

                     /gene="PGAM-M"

     intron          2275..2377

                     /number=1

                     /gene="PGAM2"

     exon            2378..2558

                     /number=2

                     /gene="PGAM-M"

  .

  .

//

LOCUS       HUMPGAMM2     677 bp ds-DNA             PRI       15-OCT-1990

DEFINITION  Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M),

            exon 3.

ACCESSION   M55674 M25818 M27096

KEYWORDS    phosphoglycerate mutase.

SEGMENT     2 of 2

  .

  .

FEATURES             Location/Qualifiers

     exon            255..457

                     /number=3

                     /gene="PGAM-M"

     intron          order(M55673:2559..>3688,