Next-generation sequencing NGS technology has rapidly advanced and generated the massive data volumes. To align and map the NGS data, biologists often randomly select a number of aligners without concerning their suitable feature, high performance, and high accuracy as well as sequence variations and polymorphisms existing on reference genome.

Metrics details. With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria.

Metrics details. Next generation sequencing NGS technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Here we present a novel FastQ Quality Control Software FaQCs that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics.

High-throughput molecular analysis is a well-known technology that plays an important role in exploring biological questions in many species, especially in human genomic studies. Over the past 20 years, gene expression profiling, a revolutionary technique, has been widely used for genomic identification, genetic testing, drug discovery, and disease diagnosis, among other things 1. The field of genomics and proteomics research has undergone neoteric fluctuations as a result of next-generation sequencing NGS , a paradigm-shifting technology that provides higher accuracy, larger throughput and more applications than the microarray platform 2 - 4. The use of massively parallel sequencing has increasingly been the object of study in recent years. The NGS technologies are implemented for several applications, including whole genome sequencing, de novo assembly sequencing, resequencing, and transcriptome sequencing at the DNA or RNA level. For instance, de novo assembly sequencing assembles the genome of a particular organism without a reference genome sequence 5 , which may lead to a better understanding at the genomic level and may assist in predicting genes, protein coding regions, and pathways. In addition, resequencing the organism with a known genome can help in understanding the relationship between genotype and phenotype and identify the differences among reference sequences 6 , 7.

Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. DOI: Kwan and Binbin Wang and X. Ma and Y. Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed.

Next-generation sequencing holds potential for improving clinical and public health microbiology. In time, laboratories may be able to replace many traditional microbiology processes with a single workflow that accommodates a wide array of pathogens. Next-generation sequencing is a versatile technology, broadly applicable to viruses, bacteria, fungi, parasites, animal vectors, and human hosts. Although microbial genomes are generally smaller and less complex than human genomes, long-read sequencing technologies such as single-molecule real-time sequencing are useful for constructing complete, highly accurate genomes and sorting out plasmids, repeats, and other complex regions. A different approach, nanopore sequencing, relies on threading individual DNA or RNA molecules through engineered protein nanopores and monitoring the electric current across each pore. The first such commercially available instrument offers relatively long sequence reads and allows data analysis to begin while sequencing is still in progress. Early limitations in throughput and accuracy have been mitigated by continued improvements in hardware and reagents.

Thank you for visiting nature. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser or turn off compatibility mode in Internet Explorer.

DNA sequencing is the process of determining the nucleic acid sequence — the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine , guanine , cytosine , and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as medical diagnosis , biotechnology , forensic biology , virology and biological systematics. Comparing healthy and mutated DNA sequences can diagnose different diseases including various cancers, [3] characterize antibody repertoire, [4] and can be used to guide patient treatment. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes , of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species. The first DNA sequences were obtained in the early s by academic researchers using laborious methods based on two-dimensional chromatography.

SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies


    NGS is the choice for large-scale genomic and transcriptomic sequencing because of the high-throughput production and outputs of sequencing data in the gigabase range per instrument run and the lower cost compared to the traditional Sanger first-generation sequencing method.

