NCBI Rhopalosiphum padi Annotation Release GCF_020882245.1-RS_2023_11

The genome sequence records for Rhopalosiphum padi RefSeq assembly GCF_020882245.1 (ASM2088224v1) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_020882245.1-RS_2023_11".

Date of Entrez queries for transcripts and proteins: Nov 9 2023
Date of submission of annotation to the public databases: Nov 15 2023
Software version: 10.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM2088224v1	GCF_020882245.1	China Agricultural University	11-16-2021	Reference	4 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM2088224v1
Genes and pseudogenes	15,592
protein-coding	13,285
non-coding	1,816
Transcribed pseudogenes	39
Non-transcribed pseudogenes	452
genes with variants	4,265
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	21,556
fully-supported	19,523
with > 5% ab initio	1,472
partial	91
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	21,556
non-coding RNAs	2,861
fully-supported	2,393
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,617
pseudo transcripts	39
fully-supported	36
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	39
CDSs	21,556
fully-supported	19,523
with > 5% ab initio	1,524
partial	91
with major correction(s)	512
known RefSeq (NP_)	0
model RefSeq (XP_)	21,556

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	15,101	12,797	4,486	65	622,929
All transcripts	24,417	2,689	2,054	65	53,821
mRNA	21,556	2,851	2,180	249	53,821
misc_RNA	399	3,116	2,460	150	42,305
tRNA	244	74	73	71	84
lncRNA	1,994	1,394	982	102	16,768
snoRNA	44	100	79	65	211
snRNA	44	147	140	102	195
rRNA	136	1,210	158	119	4,214
Single-exon transcripts	1,133	1,404	1,073	249	14,408
coding transcripts (NM_/XM_ )	1,133	1,404	1,073	249	14,408
CDSs	21,556	1,890	1,377	108	53,148
Exons	115,566	319	182	1	35,360
in coding transcripts (NM_/XM_ )	109,210	319	183	1	35,360
in non-coding transcripts (NR_/XR_ )	8,737	289	165	9	35,360
Introns	99,267	2,127	183	30	454,913
in coding transcripts (NM_/XM_ )	94,601	2,080	171	30	436,298
in non-coding transcripts (NR_/XR_ )	6,961	2,595	388	31	454,913

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.63	1	1	50
Number of exons per transcript	9.08	7	1	160

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the hemiptera_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Drosophila melanogaster known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 13285 coding genes, 8423 genes had a protein with an alignment covering 50% or more of the query and 2544 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Drosophila melanogaster known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM2088224v1	GCF_020882245.1	52.57%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	597	593 (99.33%)	584 (97.82%)	99.64%	99.29%
Same-species EST	17,892	10,766 (60.17%)	10,479 (58.57%)	99.27%	95.09%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	3,046,353,564	90%	30%	107,995
SAMEA104458993	whole body, (Rhopalosiphum padi, SAMEA104458993)	1,761,062,126	94%	28%	105,103
SAMEA3505144	Head (Rhopalosiphum padi, clonal female, SAMEA3505144)	49,489,960	92%	27%	90,042
SAMEA3505145	Head (Rhopalosiphum padi, clonal female, SAMEA3505145)	30,033,282	94%	27%	84,499
SAMEA3505146	Head (Rhopalosiphum padi, clonal female, SAMEA3505146)	38,240,340	95%	25%	85,022
SAMEA3505147	bodies (without nymphs) (Rhopalosiphum padi, clonal female, SAMEA3505147)	46,793,958	96%	30%	89,071
SAMEA3505148	bodies (without nymphs) (Rhopalosiphum padi, clonal female, SAMEA3505148)	24,628,546	95%	29%	81,417
SAMEA3505149	bodies (without nymphs) (Rhopalosiphum padi, clonal female, SAMEA3505149)	38,282,188	95%	28%	88,071
SAMN04532692	whole organism (Rhopalosiphum padi, SAMN04532692)	31,008,348	78%	43%	85,036
SAMN07187762	whole body (Rhopalosiphum padi, SAMN07187762)	54,135,326	30%	21%	70,684
SAMN10031460	whole body (Rhopalosiphum padi, pooled male and female, SAMN10031460)	48,254,726	85%	28%	84,193
SAMN10031461	whole body (Rhopalosiphum padi, pooled male and female, SAMN10031461)	46,507,670	69%	30%	82,879
SAMN10031462	whole body (Rhopalosiphum padi, pooled male and female, SAMN10031462)	43,329,590	62%	26%	81,882
SAMN10031463	whole body (Rhopalosiphum padi, pooled male and female, SAMN10031463)	56,894,484	89%	37%	91,647
SAMN10031464	whole body (Rhopalosiphum padi, pooled male and female, SAMN10031464)	44,832,926	86%	35%	90,379
SAMN10031465	whole body (Rhopalosiphum padi, pooled male and female, SAMN10031465)	50,296,216	78%	24%	83,845
SAMN12331458	whole-body (Rhopalosiphum padi, 3rd, SAMN12331458)	46,903,846	89%	30%	77,049
SAMN12331459	whole-body (Rhopalosiphum padi, 3rd, SAMN12331459)	48,874,118	88%	27%	77,514
SAMN12530355	whole body (Rhopalosiphum padi, SAMN12530355)	56,267,972	88%	36%	83,500
SAMN12530356	whole body (Rhopalosiphum padi, SAMN12530356)	55,557,260	64%	34%	86,135
SAMN20525692	muscles (Rhopalosiphum padi, SAMN20525692)	62,468,816	90%	33%	88,798
SAMN23638105	whole body (Rhopalosiphum padi, SAMN23638105)	54,504,886	55%	32%	90,720
SAMN32957389	Whole body (Rhopalosiphum padi, 1st instar, SAMN32957389)	60,660,650	94%	40%	86,179
SAMN32957390	Whole body (Rhopalosiphum padi, 1st instar, SAMN32957390)	54,177,166	93%	40%	85,895
SAMN32957391	Whole body (Rhopalosiphum padi, 1st instar, SAMN32957391)	53,389,860	94%	41%	84,977
SAMN32957392	Whole body (Rhopalosiphum padi, mature embryo, SAMN32957392)	61,752,004	94%	42%	86,920
SAMN32957393	Whole body (Rhopalosiphum padi, mature embryo, SAMN32957393)	63,296,868	94%	41%	85,956
SAMN32957394	Whole body (Rhopalosiphum padi, mature embryo, SAMN32957394)	64,710,432	94%	42%	84,928

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR983159	ERX1064374	ERP011063	SAMEA3505144	49,489,960	92%	27%
ERR983160	ERX1064375	ERP011063	SAMEA3505145	30,033,282	94%	27%
ERR983161	ERX1064376	ERP011063	SAMEA3505146	38,240,340	95%	25%
ERR983162	ERX1064377	ERP011063	SAMEA3505147	46,793,958	96%	30%
ERR983163	ERX1064378	ERP011063	SAMEA3505148	24,628,546	95%	29%
ERR983164	ERX1064379	ERP011063	SAMEA3505149	38,282,188	95%	28%
ERR2238845	ERX2291005	ERP106130	SAMEA104458993	15,696,906	94%	28%
ERR2238846	ERX2291006	ERP106130	SAMEA104458993	15,601,530	94%	28%
ERR2238847	ERX2291007	ERP106130	SAMEA104458993	15,248,142	93%	28%
ERR2238848	ERX2291008	ERP106130	SAMEA104458993	15,838,788	93%	28%
ERR2238849	ERX2291009	ERP106130	SAMEA104458993	9,436,058	94%	27%
ERR2238850	ERX2291010	ERP106130	SAMEA104458993	9,303,026	94%	27%
ERR2238851	ERX2291011	ERP106130	SAMEA104458993	9,135,920	94%	27%
ERR2238852	ERX2291012	ERP106130	SAMEA104458993	9,459,144	94%	27%
ERR2238853	ERX2291013	ERP106130	SAMEA104458993	13,619,260	94%	28%
ERR2238854	ERX2291014	ERP106130	SAMEA104458993	13,474,940	94%	28%
ERR2238855	ERX2291015	ERP106130	SAMEA104458993	13,202,244	94%	28%
ERR2238856	ERX2291016	ERP106130	SAMEA104458993	13,695,518	94%	28%
ERR2238857	ERX2291017	ERP106130	SAMEA104458993	14,216,036	94%	28%
ERR2238858	ERX2291018	ERP106130	SAMEA104458993	14,032,452	94%	28%
ERR2238859	ERX2291019	ERP106130	SAMEA104458993	13,771,672	93%	28%
ERR2238860	ERX2291020	ERP106130	SAMEA104458993	14,271,724	94%	28%
ERR2238861	ERX2291021	ERP106130	SAMEA104458993	16,157,614	94%	28%
ERR2238862	ERX2291022	ERP106130	SAMEA104458993	15,916,200	94%	28%
ERR2238863	ERX2291023	ERP106130	SAMEA104458993	15,639,188	93%	28%
ERR2238864	ERX2291024	ERP106130	SAMEA104458993	16,188,808	93%	28%
ERR2238865	ERX2291025	ERP106130	SAMEA104458993	15,745,098	93%	26%
ERR2238866	ERX2291026	ERP106130	SAMEA104458993	15,624,250	93%	26%
ERR2238867	ERX2291027	ERP106130	SAMEA104458993	15,294,236	93%	26%
ERR2238868	ERX2291028	ERP106130	SAMEA104458993	15,853,956	93%	26%
ERR2238869	ERX2291029	ERP106130	SAMEA104458993	16,980,132	94%	28%
ERR2238870	ERX2291030	ERP106130	SAMEA104458993	16,860,194	94%	28%
ERR2238871	ERX2291031	ERP106130	SAMEA104458993	16,483,882	93%	28%
ERR2238872	ERX2291032	ERP106130	SAMEA104458993	17,120,530	93%	28%
ERR2238873	ERX2291033	ERP106130	SAMEA104458993	16,433,876	94%	28%
ERR2238874	ERX2291034	ERP106130	SAMEA104458993	16,340,170	94%	28%
ERR2238875	ERX2291035	ERP106130	SAMEA104458993	15,958,758	94%	28%
ERR2238876	ERX2291036	ERP106130	SAMEA104458993	16,572,276	94%	28%
ERR2238877	ERX2291037	ERP106130	SAMEA104458993	14,056,620	95%	28%
ERR2238878	ERX2291038	ERP106130	SAMEA104458993	14,279,328	95%	28%
ERR2238879	ERX2291039	ERP106130	SAMEA104458993	14,080,064	95%	28%
ERR2238880	ERX2291040	ERP106130	SAMEA104458993	14,332,968	95%	28%
ERR2238881	ERX2291041	ERP106130	SAMEA104458993	13,104,382	94%	29%
ERR2238882	ERX2291042	ERP106130	SAMEA104458993	13,570,652	94%	29%
ERR2238883	ERX2291043	ERP106130	SAMEA104458993	13,246,724	94%	29%
ERR2238884	ERX2291044	ERP106130	SAMEA104458993	13,671,244	94%	29%
ERR2238885	ERX2291045	ERP106130	SAMEA104458993	17,781,466	93%	28%
ERR2238886	ERX2291046	ERP106130	SAMEA104458993	17,635,604	93%	28%
ERR2238887	ERX2291047	ERP106130	SAMEA104458993	17,278,848	93%	28%
ERR2238888	ERX2291048	ERP106130	SAMEA104458993	17,905,224	93%	28%
ERR2238889	ERX2291049	ERP106130	SAMEA104458993	16,543,524	94%	28%
ERR2238890	ERX2291050	ERP106130	SAMEA104458993	16,276,634	94%	29%
ERR2238891	ERX2291051	ERP106130	SAMEA104458993	15,989,244	93%	28%
ERR2238892	ERX2291052	ERP106130	SAMEA104458993	16,579,390	93%	29%
ERR2238893	ERX2291053	ERP106130	SAMEA104458993	16,853,082	94%	28%
ERR2238894	ERX2291054	ERP106130	SAMEA104458993	16,702,288	94%	28%
ERR2238895	ERX2291055	ERP106130	SAMEA104458993	16,356,388	93%	28%
ERR2238896	ERX2291056	ERP106130	SAMEA104458993	16,953,814	94%	28%
ERR2238897	ERX2291057	ERP106130	SAMEA104458993	17,133,638	94%	29%
ERR2238898	ERX2291058	ERP106130	SAMEA104458993	16,958,550	94%	29%
ERR2238899	ERX2291059	ERP106130	SAMEA104458993	16,604,080	93%	29%
ERR2238900	ERX2291060	ERP106130	SAMEA104458993	17,217,950	94%	29%
ERR2238901	ERX2291061	ERP106130	SAMEA104458993	9,921,112	95%	29%
ERR2238902	ERX2291062	ERP106130	SAMEA104458993	10,313,760	95%	29%
ERR2238903	ERX2291063	ERP106130	SAMEA104458993	10,039,374	95%	29%
ERR2238904	ERX2291064	ERP106130	SAMEA104458993	10,385,994	95%	29%
ERR2238905	ERX2291065	ERP106130	SAMEA104458993	11,163,762	95%	28%
ERR2238906	ERX2291066	ERP106130	SAMEA104458993	11,390,240	95%	28%
ERR2238907	ERX2291067	ERP106130	SAMEA104458993	11,176,516	95%	28%
ERR2238908	ERX2291068	ERP106130	SAMEA104458993	11,422,858	95%	28%
ERR2238909	ERX2291069	ERP106130	SAMEA104458993	9,325,328	95%	28%
ERR2238910	ERX2291070	ERP106130	SAMEA104458993	9,776,866	95%	28%
ERR2238911	ERX2291071	ERP106130	SAMEA104458993	9,456,738	95%	28%
ERR2238912	ERX2291072	ERP106130	SAMEA104458993	9,859,790	95%	28%
ERR2238913	ERX2291073	ERP106130	SAMEA104458993	12,488,012	95%	28%
ERR2238914	ERX2291074	ERP106130	SAMEA104458993	12,727,018	95%	28%
ERR2238915	ERX2291075	ERP106130	SAMEA104458993	12,535,636	95%	28%
ERR2238916	ERX2291076	ERP106130	SAMEA104458993	12,788,712	95%	28%
ERR2238917	ERX2291077	ERP106130	SAMEA104458993	8,676,760	95%	27%
ERR2238918	ERX2291078	ERP106130	SAMEA104458993	8,792,626	95%	27%
ERR2238919	ERX2291079	ERP106130	SAMEA104458993	8,669,832	95%	27%
ERR2238920	ERX2291080	ERP106130	SAMEA104458993	8,820,600	95%	27%
ERR2238921	ERX2291081	ERP106130	SAMEA104458993	11,034,466	95%	29%
ERR2238922	ERX2291082	ERP106130	SAMEA104458993	11,551,506	95%	29%
ERR2238923	ERX2291083	ERP106130	SAMEA104458993	11,212,006	95%	29%
ERR2238924	ERX2291084	ERP106130	SAMEA104458993	11,664,926	95%	29%
ERR2238925	ERX2291085	ERP106130	SAMEA104458993	8,838,838	94%	28%
ERR2238926	ERX2291086	ERP106130	SAMEA104458993	9,378,682	93%	28%
ERR2238927	ERX2291087	ERP106130	SAMEA104458993	9,016,788	93%	28%
ERR2238928	ERX2291088	ERP106130	SAMEA104458993	9,506,080	93%	28%
ERR2238929	ERX2291089	ERP106130	SAMEA104458993	12,574,674	95%	28%
ERR2238930	ERX2291090	ERP106130	SAMEA104458993	12,769,002	95%	28%
ERR2238931	ERX2291091	ERP106130	SAMEA104458993	12,589,068	95%	28%
ERR2238932	ERX2291092	ERP106130	SAMEA104458993	12,819,542	95%	28%
ERR2238933	ERX2291093	ERP106130	SAMEA104458993	12,981,794	95%	28%
ERR2238934	ERX2291094	ERP106130	SAMEA104458993	13,196,396	95%	28%
ERR2238935	ERX2291095	ERP106130	SAMEA104458993	13,005,628	95%	28%
ERR2238936	ERX2291096	ERP106130	SAMEA104458993	13,254,876	95%	28%
ERR2238937	ERX2291097	ERP106130	SAMEA104458993	13,405,154	94%	26%
ERR2238938	ERX2291098	ERP106130	SAMEA104458993	13,635,348	94%	26%
ERR2238939	ERX2291099	ERP106130	SAMEA104458993	13,436,468	94%	26%
ERR2238940	ERX2291100	ERP106130	SAMEA104458993	13,702,596	94%	26%
ERR2238941	ERX2291101	ERP106130	SAMEA104458993	11,718,142	94%	27%
ERR2238942	ERX2291102	ERP106130	SAMEA104458993	11,889,238	94%	27%
ERR2238943	ERX2291103	ERP106130	SAMEA104458993	11,729,936	94%	27%
ERR2238944	ERX2291104	ERP106130	SAMEA104458993	11,925,252	94%	27%
ERR2238945	ERX2291105	ERP106130	SAMEA104458993	27,058,126	94%	27%
ERR2238946	ERX2291106	ERP106130	SAMEA104458993	27,850,068	94%	27%
ERR2238947	ERX2291107	ERP106130	SAMEA104458993	27,247,354	94%	27%
ERR2238948	ERX2291108	ERP106130	SAMEA104458993	27,980,992	94%	27%
ERR2238949	ERX2291109	ERP106130	SAMEA104458993	15,618,848	94%	27%
ERR2238950	ERX2291110	ERP106130	SAMEA104458993	7,724,894	188%	27%
ERR2238951	ERX2291111	ERP106130	SAMEA104458993	15,189,996	93%	27%
ERR2238952	ERX2291112	ERP106130	SAMEA104458993	15,716,574	93%	27%
ERR2238953	ERX2291113	ERP106130	SAMEA104458993	18,627,486	94%	28%
ERR2238954	ERX2291114	ERP106130	SAMEA104458993	18,401,428	94%	28%
ERR2238955	ERX2291115	ERP106130	SAMEA104458993	18,075,614	93%	28%
ERR2238956	ERX2291116	ERP106130	SAMEA104458993	18,710,350	93%	28%
ERR2238957	ERX2291117	ERP106130	SAMEA104458993	15,377,504	94%	28%
ERR2238958	ERX2291118	ERP106130	SAMEA104458993	15,004,240	94%	29%
ERR2238959	ERX2291119	ERP106130	SAMEA104458993	14,787,216	94%	29%
ERR2238960	ERX2291120	ERP106130	SAMEA104458993	15,296,680	94%	29%
ERR2238961	ERX2291121	ERP106130	SAMEA104458993	28,543,336	94%	28%
ERR2238962	ERX2291122	ERP106130	SAMEA104458993	29,106,966	94%	28%
ERR2238963	ERX2291123	ERP106130	SAMEA104458993	28,655,198	94%	28%
ERR2238964	ERX2291124	ERP106130	SAMEA104458993	29,237,082	94%	28%
SRR3203855	SRX1613231	SRP071143	SAMN04532692	31,008,348	78%	43%
SRR5642368	SRX2879746	SRP108464	SAMN07187762	54,135,326	30%	21%
SRR7824945	SRX4675995	SRP161633	SAMN10031460	48,254,726	85%	28%
SRR7824946	SRX4675994	SRP161633	SAMN10031461	46,507,670	69%	30%
SRR7824938	SRX4676002	SRP161633	SAMN10031462	43,329,590	62%	26%
SRR7824941	SRX4675999	SRP161633	SAMN10031463	56,894,484	89%	37%
SRR7824929	SRX4676011	SRP161633	SAMN10031464	44,832,926	86%	35%
SRR7824942	SRX4675998	SRP161633	SAMN10031465	50,296,216	78%	24%
SRR9722915	SRX6480465	SRP215840	SAMN12331458	46,903,846	89%	30%
SRR9722916	SRX6480464	SRP215840	SAMN12331459	48,874,118	88%	27%
SRR9945313	SRX6693811	SRP218018	SAMN12530355	56,267,972	88%	36%
SRR9945312	SRX6693812	SRP218018	SAMN12530356	55,557,260	64%	34%
SRR15365008	SRX11667462	SRP331425	SAMN20525692	62,468,816	90%	33%
SRR17183779	SRX13366773	SRP350056	SAMN23638105	54,504,886	55%	32%
SRR23300090	SRX19243118	SRP420389	SAMN32957389	60,660,650	94%	40%
SRR23300089	SRX19243119	SRP420389	SAMN32957390	54,177,166	93%	40%
SRR23300088	SRX19243120	SRP420389	SAMN32957391	53,389,860	94%	41%
SRR23300087	SRX19243121	SRP420389	SAMN32957392	61,752,004	94%	42%
SRR23300086	SRX19243122	SRP420389	SAMN32957393	63,296,868	94%	41%
SRR23300085	SRX19243123	SRP420389	SAMN32957394	64,710,432	94%	42%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Myzus persicae high-quality model RefSeq (XP_)	11,299	10,595 (93.77%)	10,595 (93.77%)	80.30%	89.60%
Diuraphis noxia high-quality model RefSeq (XP_)	7,506	7,411 (98.73%)	7,411 (98.73%)	84.26%	93.28%
Halyomorpha halys high-quality model RefSeq (XP_)	11,226	7,723 (68.80%)	7,723 (68.80%)	61.27%	55.79%
Same-species GenBank	565	565 (100.00%)	565 (100.00%)	87.37%	94.77%
Insecta GenBank	130,716	87,280 (66.77%)	87,280 (66.77%)	68.01%	69.98%
Insecta known RefSeq (NP_)	9,564	6,966 (72.84%)	6,966 (72.84%)	71.22%	71.01%
Acyrthosiphon pisum high-quality model RefSeq (XP_)	11,742	10,949 (93.25%)	10,949 (93.25%)	76.88%	87.10%
Bemisia tabaci high-quality model RefSeq (XP_)	11,628	7,907 (68.00%)	7,907 (68.00%)	62.88%	58.93%
Drosophila melanogaster known RefSeq (NP_)	30,279	18,330 (60.54%)	18,330 (60.54%)	65.81%	59.33%
Cimex lectularius high-quality model RefSeq (XP_)	11,205	7,648 (68.26%)	7,648 (68.26%)	64.01%	61.55%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences