| Paired-end libraries for sequencing were prepared according to the manufacturer's instructions (Illumina, TruSeq?). |
| In short, approximately 1 ?g of genomic DNA, isolated from frozen blood samples, was fragmented to a mean target size of 300 bp |
| using a Covaris E210 instrument. The resulting fragmented DNA was end repaired using T4 and Klenow polymerases and T4 polynucleotide |
| kinase with 10 mM dNTP followed by addition of an 'A' base at the ends using Klenow exo fragment (3? to 5?-exo minus) and dATP (1 mM). |
| Sequencing adaptors containing 'T' overhangs were ligated to the DNA products followed by agarose (2%) gel electrophoresis. Fragments |
| of about 400-500 bp were isolated from the gels (QIAGEN Gel Extraction Kit), and the adaptor-modified DNA fragments were PCR enriched |
| for ten cycles using Phusion DNA polymerase (Finnzymes Oy) and a PCR primer cocktail (Illumina). Enriched libraries were further |
| purified using AMPure XP beads (Beckman-Coulter). The quality and concentration of the libraries were assessed with the Agilent 2100 |
| Bioanalyzer using the DNA 1000 LabChip (Agilent). Barcoded libraries were stored at ?20 °C. All steps in the workflow were monitored |
| using an in-house laboratory information management system with barcode tracking of all samples and reagents. Template DNA fragments were |
| hybridized to the surface of flow cells (GA PE cluster kit (v2) or HiSeq PE cluster kits (v2.5 or v3)) and amplified to form clusters |
| using the Illumina cBot. In brief, DNA (2.5?12 pM) was denatured, followed by hybridization to grafted adaptors on the flow cell. |
| Isothermal bridge amplification using Phusion polymerase was then followed by linearization of the bridged DNA, denaturation, blocking |
| of 3´ ends and hybridization of the sequencing primer. Sequencing-by-synthesis (SBS) was performed on Illumina GAIIx and/or HiSeq 2000 |
| instruments. Paired-end libraries were sequenced at 2 × 101 (HiSeq) or 2 × 120 (GAIIx) cycles of incorporation and imaging using |
| the appropriate TruSeq? SBS kits. Each library or sample was initially run on a single GAIIx lane for QC validation followed by further |
| sequencing on either GAIIx (?4 lanes) or HiSeq (?1 lane) with targeted raw cluster densities of 500?800 k/mm2, depending on the version |
| of the data imaging and analysis packages (SCS2.6-2-9/RTA1.6-1.9, HCS1.3.8-1.4.8/RTA1.10.36-1.12.4.2). Real-time analysis involved conversion |
| of image data to base-calling in real-time. Reads were aligned to NCBI Build 36 (hg18) of the human reference sequence using Burrows-Wheeler |
| Aligner (BWA) 0.5.7-0.5.916. Alignments were merged into a single BAM file and marked for duplicates using Picard 1.55 |
| (http://picard.sourceforge.net/). Only non-duplicate reads were used for the downstream analyses. Resulting BAM files were realigned and |
| recalibrated using GATK version 1.2-29-g0acaf2d8,17. Multi-sample calling was performed with GATK version 2.3.9 using all the 2,636 BAM |
| files together. |
| Genotype calls made solely on the basis of next generation sequence data yield errors at a rate that decreases as a function of sequencing |
| depth. Thus, for example, if sequence reads at a heterozygous SNP position carry one copy of the alternative allele and seven copies of |
| the reference allele, then without further information the genotype would be called homozygous for the reference allele. To minimize the |
| number of such errors, we used information about haplotype sharing, taking advantage of the fact that all the sequenced individuals had |
| also been chip-typed and long range phased (Figure 2). Extending the previous example, if the individual shares a haplotype with another |
| who is heterozygous given his sequence reads, then the ambiguous individual would be called as heterozygous. Conversely, if the individual |
| shares both his haplotypes with others who are homozygous for the major allele his genotype would be called homozygous. In order to improve |
| genotype quality and to phase the sequencing genotypes, an iterative algorithm based on the IMPUTE HMM model 2 which uses the LRP haplotypes |
| was employed. Co-ordinates from genome build 36 (GCF_000001405.12) were lifted over to builds 37 (GCA_000001405.14) and 38 (GCA_000001405.17) |
| using the liftover tool from UCSC and the default options. |