Long-read sequencing reveals genomic structural variations that underlie creation of Quality Protein Maize.
Changsheng Li, Xiaoli Xiang, Yongcai Huang, Yong Zhou, Dong An, Jiaqiang Dong, Chenxi Zhao, Hongjun Liu, Yubin Li, Qiong Wang, Chunguang Du, Joachim Messing, Brian A. Larkins, Yongrui Wu, Wenqin Wang
Assembly methods: PacBio reads was assembled by Falcon to produce contigs, Bionano optical maps was used to build scaffolds, and B73 v4 was used to Construct pseudomolecules Construction of pseudomolecules: Yes
A contig is a contiguous consensus sequence that is
derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs
that are linked to one another by mate pairs of sequencing reads.
The K0326Y genome was annotated with the pipeline of MAKER-P (Campbell et al. 2014) based on comprehensive evidence from homologous protein sequences, K0326Y transcripts, and ab initio prediction, with parameters and evidence similar to those recently used to annotate B73(Law et al. 2015; Jiao et al. 2016). We used the same repeat masking library that was used for B73 annotation (Schnable et al. 2009), with the addition of the LTR library annotated by LTRharvest (Ellinghaus et al 2008). K0326Y transcripts evidence were derived from assembled RNA-seq reads by Trinity (Grabherr et al. 2011) and PacBio non-redundant Isoforms (n=247,616) sequenced from cDNA libraries of ten tissues in K0326Y. The Cross-species protein sequences were downloaded from Ensemblgenomes release-41, including the species of Arabidopsis thaliana, Oryza sativa, Setaria italica, Sorghum bicolor and Zea mays. The gene model prediction was generated by the programs of Augustus (Stanke et al. 2006) and FGENESH (Salamov and Solovyev 2000) trained for maize and other monocots. Stable gene identifiers were assigned using the format Zm00054aXXXXXX (where the X's represent a random 6-digit number), as specified under A Standard For Maize Genetics Nomenclature available at MaizeGDB.