Full genome completed
Researchers have published details of the first ever complete, gapless sequence of a human genome.
It has been 22 years since scientists announced they had put together the first genetic blueprint of a human being - pinpointing all 3 billion units of DNA that make up the human genome. However, this was considered a first draft.
A team from the US has now completed the job - producing the first truly complete sequence of a human genome, covering each chromosome from end to end with no gaps and unprecedented accuracy.
The findings are accessible through the UCSC Genome Browser and are described in six papers published in the latest edition of Science.
Since the first working draft of a human genome sequence was assembled in 2000, genomics research has led to enormous advances in our understanding of human biology and disease.
Nevertheless, crucial regions accounting for some 8 per cent of the human genome have remained hidden from scientists for over 20 years due to the limitations of DNA sequencing technologies.
In 2019, a working group was established - the Telomere-to-Telomere (T2T) Consortium - to fill in the missing pieces. Their efforts have now paid off.
The new reference genome, called T2T-CHM13, adds nearly 200 million base pairs of novel DNA sequences, including 99 genes likely to code for proteins and nearly 2,000 candidate genes that need further study. It also corrects thousands of structural errors in the current reference sequence.
The gaps now filled by the new sequence include the entire short arms of five human chromosomes and cover some of the most complex regions of the genome.
These include highly repetitive DNA sequences found in and around important chromosomal structures such as the telomeres at the ends of chromosomes and the centromeres that coordinate the separation of replicated chromosomes during cell division.
The new sequence also reveals previously undetected segmental duplications, long stretches of DNA that are duplicated in the genome and are known to play important roles in evolution and disease.
“These parts of the human genome that we haven’t been able to study for 20-plus years are important to our understanding of how the genome works, genetic diseases, and human diversity and evolution,” says Dr Karen Miga, assistant professor of biomolecular engineering at UC Santa Cruz and founding member of T2T.
Many of the newly revealed regions have important functions in the genome even if they do not include active genes.
The new T2T reference genome will complement the standard human reference genome, known as Genome Reference Consortium build 38 (GRCh38), which had its origins in the publicly funded Human Genome Project and has been continually updated since the first draft in 2000.
The T2T Consortium has now joined with the Human Pangenome Reference Consortium, which aims to create a new “human pangenome reference” based on the complete genome sequences of 350 individuals.
The papers detailing the findings are; “The complete sequence of a human genome,”, “Complete genomic and epigenetic maps of human centromeres” and “Epigenetic patterns in a complete human genome,” as well as “Segmental duplications and their variation in a complete human genome,” “A complete reference genome improves analysis of human genetic variation,” and “From telomere to telomere: the transcriptional and epigenetic state of human repeat elements.”