Feature Table Information

The feature table contains information about genes and gene products, as well as regions of biological significance reported in the sequence. The feature table contains information on regions of the sequence that code for proteins and RNA molecules. It also enumerates differences between different reports of the same sequence, and provides cross-references to other data collections, as described in more detail below.

Feature Keys

The first column of the feature descriptor line contains the feature key. The list of valid feature keys is shown below.

keyword	description
allele	Related strain contains alternative gene form.
attenuator	Sequence related to transcription termination.
C_region	Span of the C immunological feature.
CAAT_signal	`CAAT box' in eukaryotic promoters.
CDS	Sequence coding for amino acids in protein (includes stop codon).
conflict	Independent determinations differ.
D-loop	Displacement loop.
D-segment	Diversity segment of immunoglobulin heavy chain and T-cell receptor beta-chain.
enhancer	Cis-acting enhancer of promoter function.
exon	Region that codes for part of spliced mRNA.
GC_signal	`GC box' in eukaryotic promoters.
iDNA	Intervening DNA eliminated by recombination.
intron	Transcribed region excised by mRNA splicing.
J-segment	Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta and gamma-chains.
LTR	Long terminal repeat.
mat_peptide	Mature peptide coding region (does not include stop codon).
misc_binding	Miscellaneous binding site.
misc_difference	Miscellaneous difference feature.
misc_feature	Region of biological significance that cannot be described by any other feature.
misc_recomb	Miscellaneous recombination feature.
misc_RNA	Miscellaneous transcript feature not defined by other RNA keys.
misc_signal	Miscellaneous signal.
misc_structure	Miscellaneous DNA or RNA structure.
modified_base	The indicated base is a modified nucleotide.
mRNA	Messenger RNA.
mutation	A mutation alters the sequence here.
N_region	Span of the N immunological feature.
old_sequence	Presented sequence revises a previous version.
polyA_signal	Signal for cleavage & polyadenylation.
polyA_site	Site at which polyadenine is added to mRNA.
precursor_RNA	Any RNA species that is not yet the mature RNA product.
prim_transcript	Primary (unprocessed) transcript.
primer_bind	Non-covalent primer binding site.
promoter	A region involved in transcription initiation.
protein_bind	Non-covalent protein binding site on DNA or RNA.
RBS	Ribosome binding site.
rep_origin	Replication origin for duplex DNA.
repeat_region	Sequence containing repeated subsequences.
repeat_unit	One repeated unit of a repeat_region.
rRNA	Ribosomal RNA.
S_region	Span of the S immunological feature.
satellite	Satellite repeated sequence.
scRNA	Small cytoplasmic RNA.
sig_peptide	Signal peptide coding region.
snRNA	Small nuclear RNA.
source	Biological source of the specified span of sequence.
stem_loop	Hair-pin loop structure in DNA or RNA.
STS	Sequence Tagged Site; operationally unique sequence that identifies the combination of primer spans used in a PCR assay.
TATA_signal	`TATA box' in eukaryotic promoters.
terminator	Sequence causing transcription termination.
transit_peptide	Transit peptide coding region.
tRNA	Transfer RNA.
unsure	Authors are unsure about the sequence in this region.
V_region	Span of the V immunological feature.
V_segment	Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta and gamma chains.
variation	A related population contains stable mutation.
virion	Virion (encapsidated) viral sequence.
- (hyphen)	Placeholder.
-10_signal	`Pribnow box' in prokaryotic promoters.
-35_signal	`-35 box' in prokaryotic promoters.
3'clip	3'-most region of a precursor transcript removed in processing.
3'UTR	3' untranslated region (trailer).
5'clip	5'-most region of a precursor transcript removed in processing.
5'UTR	5' untranslated region (leader).

Feature Location

The second column of the feature descriptor line designates the location of the feature in the sequence. Several conventions are used to indicate sequence location.
Base numbers in location descriptors refer to numbering in the entry, which is not necessarily the same as the numbering scheme used in the published report. The first base in the presented sequence is numbered base 1. Sequences are presented in the 5 to 3 direction.
Location descriptors can be one of the following:

A single base
A contiguous span of bases
A site between two bases
A single base chosen from a range of bases
A single base chosen from among two or more specified bases
A joining of sequence spans
A reference to an entry other than the one to which the feature belongs (i.e., a remote entry), followed by a location descriptor referring to the remote sequence
A literal sequence (a string of bases enclosed in quotation marks)

A site between two residues, such as an endonuclease cleavage site, is indicated by listing the two bases separated by a carat (e.g., 23^24).
A single residue chosen from a range of residues is indicated by the number of the first and last bases in the range separated by a single period (e.g., 23.79).
The symbols < and > indicate that the end point of the range is beyond the specified base number.
A contiguous span of bases is indicated by the number of the first and last bases in the range separated by two periods (e.g., 23..79).
Operators are prefixes that specify what must be done to the indicated sequence to locate the feature. The following are the operators available, along with their most common format and a description.

complement (location): The feature is complementary to the location indicated. Complementary strands are read 5 to 3.
join (location, location, .. location): The indicated elements should be placed end to end to form one contiguous sequence.
order (location, location, .. location): The elements are found in the specified order in the 5 to 3 direction, but nothing is implied about the rationality of joining them.
group (location, location, .. location): The elements are related and should be grouped together, but no order is implied.
one-of (location, location, .. location): The element can be any one, but only one, of the items listed.
replace (location, location): The first location indicated should be replaced by the sequence from the second location; used for insertions, deletions, and variants.

Feature Qualifiers

Qualifiers provide additional information about features. They take the form of a slash (/) followed by a qualifier name and, if applicable, an equal sign (=) and a qualifier value. Qualifiers convey many types of information. Therefore, their values can take free text, controlled vocabulary, enumerated values, reference numbers, sequences, or feature labels.
The following is a list of valid feature qualifiers.

qualifier	description
anticodon	Location of the anticodon of tRNA and the amino acid for which it codes
bound_moiety	Moiety bound
cell_line	Cell line from which the sequence was obtained
cell_type	Cell type from which the sequence was obtained
chromosome	Chromosome from which the sequence was obtained
chloroplast	Organelle type from which the sequence was obtained
chromoplast	Organelle type from which the sequence was obtained
citation	Reference to a citation providing the claim of or evidence for a feature
clone	Clone from which the sequence was obtained
clone_lib	Clone library from which the sequence was obtained
codon	Specifies a codon that is different from any found in the reference genetic code
codon_start	Indicates the first base of the first complete codon in a CDS (as 1 or 2 or 3)
cons_splice	Identifies intron splice sites that do not conform to the 5'-GT... AG-3' splice site consensus
cultivar	variety of plant from which the sequence was obtained
cyanelle	Organelle type from which the sequence was obtained
db_xref	Cross-reference to an external database
dev_stage	Developmental stage of source organism
direction	Direction of DNA replication
EC_number	Enzyme Commission number for the enzyme product of the sequence
evidence	Value indicating the nature of supporting evidence
frequency	Frequency of the occurrence of a feature
function	Function attributed to a sequence
gene	Symbol of the gene corresponding to a sequence region (usable with all features)
gdb_xref	Genome Databank unique ID cross reference qualifier
germline	Immunoglobulin unrearranged DNA
haplotype	Haplotype of organism from which sequence was obtained
insertion_seq	Insertion sequence element from which sequence was obtained
isolate	Individual isolate from which sequence was obtained
kinetoplast	Organelle type from which sequence was obtained
label	A label used to permanently identify a feature
lab_host	Laboratory host used to propagate the organism from which sequence was obtained
map	Map position of the feature in free-format text
macronuclear	Macronuclear DNA
mitochondrion	Organelle type from which sequence was obtained
mod_base	Abbreviation for a modified nucleotide base
note	Any comment or additional information
number	A number indicating the order of genetic elements (e.g., exons or introns) in the 5 to 3 direction
organism	Name of organism if different from that contained in the entry's ORGANISM field
partial	Differentiates between complete regions and partial ones
PCR_conditions	PCR reaction conditions and components
pop_variant	Population variant from which sequence was obtained
phenotype	Phenotype conferred by the feature
plasmid	Name of plasmid from which sequence was obtained
product	Name of a product encoded by the sequence
proviral	Viral sequence integrated into another organism's genome
pseudo	Indicates that this feature is a non-functional version of the element named by the feature key
rearranged	Immunoglobulin rearranged DNA
rpt_family	Type of repeated sequence; Alu or Kpn, for example
rpt_type	Organization of repeated sequence
rpt_unit	Identity of repeat unit that constitutes a repeat_region
serotype	Serotype from which sequence was obtained
sex	Sex of organism from which sequence was obtained
sequenced_mol	Molecule from which sequence was obtained
specific_host	Natural host from which sequence was obtained
standard_name	Accepted standard name for this feature
strain	Strain from which sequence was obtained
sub_clone	Sub-clone from which sequence was obtained
sub_species	Sub-species name of organism from which sequence was obtained
sub_strain	Sub-strain from which sequence was obtained
tissue_lib	Tissue library from which sequence was obtained
tissue_type	Tissue type from which sequence was obtained
translation	Amino acid translation of coding region (automatically generated)
transl_except	Translational exception: single codon, the translation of which does not conform to the reference genetic code
transl_table	Genetic code table
transposon	Transposable element from which sequence was obtained
type	Name of a strain if different from that in the SOURCE field
usedin	Indicates that feature is used in a compound feature in another entry
variety	Variety from which sequence was obtained

Feature Table Example 1


     CDS             5..1261

                     /product="alpha-1-antitrypsin precursor"

                     /map="14q32.1"

                     /gene="PI"

     tRNA            1..87

                     /note="Leu-tRNA-CAA (NAR: 1057)"

                     /anticodon=(pos:35..37,aa:Leu)

     mRNA            1..>66

                     /note="alpha-1-acid glycoprotein mRNA"

     transposon      <1..267 /note="insertion element IS5" misc_recomb 105^106 /note="B.subtilis DNA end/IS5 DNA start" conflict replace(258..258,"t") /citation="[2]"

Feature Table Example 2 -- joining entries


LOCUS       HUMPGAMM1    3688 bp ds-DNA             PRI       15-OCT-1990

DEFINITION  Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M)

            gene, 5' end.

ACCESSION   M55673 M25818 M27095

KEYWORDS    phosphoglycerate mutase.

SEGMENT     1 of 2

  .

  .

FEATURES             Location/Qualifiers

     CAAT_signal     1751..1755

                     /gene="PGAM-M"

     TATA_signal     1791..1799

                     /gene="PGAM-M"

     exon            1820..2274

                     /number=1

                     /EC_number="5.4.2.1"

                     /gene="PGAM-M"

     intron          2275..2377

                     /number=1

                     /gene="PGAM2"

     exon            2378..2558

                     /number=2

                     /gene="PGAM-M"

  .

  .

//

LOCUS       HUMPGAMM2     677 bp ds-DNA             PRI       15-OCT-1990

DEFINITION  Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M),

            exon 3.

ACCESSION   M55674 M25818 M27096

KEYWORDS    phosphoglycerate mutase.

SEGMENT     2 of 2

  .

  .

FEATURES             Location/Qualifiers

     exon            255..457

                     /number=3

                     /gene="PGAM-M"

     intron          order(M55673:2559..>3688,