Accession numbers (or accessions) uniquely identify database records or individual units of data. You will encounter accessions in NCBI databases that serve as repositories of nucleotide and protein sequences and other molecular data.
Accession number formats
NCBI staff and collaborators establish rules on what the accession numbers should look like for different data types. In other words, they establish accession formats. Accession formats most often consist of an alphabetical prefix that is followed by a series of digits. An accession may also contain a version number. Let’s have a closer look at the two types – (1) accessions without version numbers and (2) accessions with version numbers.
1. Accessions without version numbers have the following generic format:
[alphabetical prefix] [series of digits]
Examples of databases that use accessions without versioning are BioProject, BioSample, Sequence Read Archive (SRA), GEO DataSets, and dbSNP.
The prefixes embed certain meanings and help with database and data recognition. For example, PRJNA1173348 is an accession for a registered study in the BioProject database. Its “PRJNA” prefix tells us that the researchers registered their study with NCBI. Meanwhile, accession numbers for studies registered with the collaborating ENA or DDBJ databases* carry accessions with “PRJEB” and “PRJDB” prefixes, respectively.
2. Accessions with version numbers have the following generic format:
[alphabetical prefix] [series of digits or alphanumeric characters] [.] [version number]
Versions track sequence updates, while preserving the base of an accession number. For example, the version suffix "4" in RefSeq** accession NM_000680.4 indicates that the sequence in the record has been updated three times. Note that the version persists through updates that do not change the sequence. For example, there is no change in the version number for updates concerning publication references, sequence source information, and so on. NCBI uses versioning for sequences in the Nucleotide and Protein databases and for genome assemblies in NCBI Datasets. The ClinVar database also has versions for clinical variants.
What can you do with accessions?
Accessions are your data currency. Use them to retrieve (manually or programmatically) the records that you need in the database. Include them in your writing and communications with professional colleagues. Make sure that you include the version number if given, so you can always refer to the correct sequence!
Where can you learn more?
Knowledge articles: