The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate
databases containing
DNA and
RNA sequences.[1] It involves the following computerized
databases:
NIG's
DNA Data Bank of Japan (
Japan),
NCBI's
GenBank (
USA) and the
EMBL-
EBI's
European Nucleotide Archive (
EMBL). New and updated data on
nucleotide sequences contributed by research teams to each of the three databases are synchronized on a daily basis through continuous interaction between the staff at each the collaborating organizations.
All of the data in INSDC is available for free and unrestricted access, for any purpose, with no restrictions on analysis, redistribution, or re-publication of the data. This policy has been a foundational principle of the INSDC since its inception.[2] The official policy statement can be found at
http://www.insdc.org/.[3] Since the 1990s, most of the world's major scientific journals have required that sequence data be deposited in an INSDC database as a pre-condition for publication.
The
DDBJ/
EMBL-EBI/
GenBank synchronization is maintained according to a number of guidelines which are produced and published by an International Advisory Board.[4] The guidelines consist of a common definition of the feature tables [5] for the databases, which regulate the content and
syntax of the database entries,[6] in the form of a common
DTD (Document Type Definition).
The syntax is called INSDSeq and its core consists of the letter sequence of the
gene expression (
amino acid sequence) and the letter sequence for
nucleotide bases in the gene or decoded segment. In a DBFetch operation shows a typical INSD entry at the EMBL-EBI database;[7] the same entry at NCBI.[8]