The contribution of common regulatory and protein-coding TYR variants in the genetic architecture of albinism

Genetic diseases have been historically segregated into rare Mendelian and common complex conditions. Large-scale studies using genome sequencing are eroding this distinction and are gradually unmasking the underlying complexity of human traits. We studied a cohort of 1,313 individuals with albinism aiming to gain insights into the genetic architecture of rare, autosomal recessive disorders. We investigated the contribution of regulatory and protein-coding variants at the common and rare ends of the allele-frequency spectrum. We focused on TYR, the gene encoding tyrosinase, and found that a promoter variant, TYR: c.-301C>T [rs4547091], modulates the penetrance of a prevalent, disease-associated missense change, TYR: c.1205G>A [rs1126809]. We also found that homozygosity for a haplotype formed by three common, functional variants, TYR: c.[-301C;575C>A;1205G>A], confers a high risk of albinism (OR>77) and is associated with reduced vision in UK Biobank participants. Finally, we report how the combined analysis of rare and common variants increases diagnostic yield and informs genetic counselling in families with albinism.


MAIN
There is abundant evidence supporting the view that rare genetic diseases are caused by rare, high-impact variants in individual genes. However, for virtually all known rare disorders, it is not possible to identify such pathogenic changes in every affected individual, leaving significant diagnostic and knowledge gaps. [9][10][11] In recent years, the emergence of comprehensive rare disease and population-based resources that link genomic and phenotypic data (e.g. UK Biobank 12 , Genomics England 100,000 Genomes Project 13 ) has offered unprecedented opportunities for genetic discovery. Through integrative analysis of the associated datasets, we can now achieve line-of-sight for uncovering complex molecular explanations in people with rare disorders who have hitherto remained undiagnosable.
Albinism, a rare condition characterised by decreased ocular pigmentation and altered visual system organisation 14 , had a pivotal role in the study of human genetics tracing back to the early 20 th century. 15,16 At least 20 genes are now known to be associated with this disorder and the current diagnostic yield of genetic testing in affected cohorts approaches 75%. [17][18][19] Most people with a molecular diagnosis of albinism carry biallelic variants in TYR, the gene encoding the ratelimiting enzyme of melanin biosynthesis 20 . Building on recent work 17,21 , we sought to increase our understanding of the genetic complexity of this archetypal disorder.
A cohort of 1208 people with albinism underwent testing of ≤19 albinism-related genes; these individuals were not known to be related and had predominantly European ancestries. A further 105 probands with albinism were identified in the . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Genomics England 100,000 Genomes Project dataset 13 . A "control" cohort of 29,497 unrelated individuals that had no recorded diagnosis/features of albinism was also identified in this resource (Fig.1 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Multiple associations have been recorded for these two changes including skin/hair pigmentation (for both variants), macular thickness (for c.575C>A) and iris colour (for c.1205G>A). 23 Furthermore, each of these changes has been shown to decrease TYR enzymatic activity in vitro. 24,25 Importantly, there is evidence suggesting that c.1205G>A is acting as a hypomorphic variant and is causing a mild form of albinism when in compound heterozygous state with a complete loss-of-function TYR mutation. 26  To gain insights into the contribution of regulatory variants, we studied the impact of changes that alter TYR regulatory elements (i.e. the TYR promoter or ENCODElisted enhancers) 28 and affect TYR gene expression (i.e. they are known TYR expression quantitative trait loci [eQTL]). One such variant was identified, c.-301C>T [rs4547091], a fetal retinal pigment epithelium (RPE) selective eQTL 29 . This change is known to alter a binding site for the transcription factor OTX2 in the TYR promoter, and the reference allele (c.-301C) has been shown to lead to a remarkable decrease in promoter activity in vitro 30 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; independent analysis of each of the TYR c.-301C>T, c.575C>A and c.1205G>A changes and instead studied the haplotype blocks that they form. Eight possible haplotypes [2 3 ] and 36 possible haplotype pairs [2 3-1 x (2 3 +1)] may be encountered.
We focused only on the 8 haplotype pairs that include homozygous alleles ( Fig.2A) for two reasons: (1) in homozygous individuals, the underlying haplotypes can be unambiguously determined, even in cases where segregation/phasing data are unavailable; (2) in autosomal recessive disorders like TYR-associated albinism, phenotypic abnormalities are the result of the combined effect of two alleles; by analyzing only homozygous cases, the effect of a specific haplotype can be isolated and estimated with greater precision.
We used Firth regression analysis 31,32 to study how TYR haplotypes in homozygous state affect the risk of albinism. This increasingly recognised logistic regression approach has been designed to handle small, imbalanced datasets (which are common in studies of rare conditions) and allows for adjustment of key covariates (which is not possible in contingency table methods) ( Fig.1  b. Risk of albinism in people carrying selected TYR haplotypes in homozygous state. Odds ratio >1 suggests an increased risk while odds ratio <1 suggests a decreased risk. Further information including numeric data can be found in Supplementary Table 2. c. Visual acuity in UK Biobank participants carrying selected TYR haplotypes in homozygous state. Vision near 0.0 LogMAR is considered normal while vision >0.5 LogMAR is considered moderate/severe visual impairment. The Kruskal-Wallis p-value was 8 x 10 -11 . Further information including numeric data can be found in Supplementary Table 4. We found that the penetrance of the hypomorphic TYR c.1205G>A is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; https://doi.org/10.1101/2021.11.01.21265733 doi: medRxiv preprint c.1205G>A combined with the c.-301T allele (which increases TYR expression) has a protective effect (OR<0.6; see [T;C;A] in Fig.2B). This observation is in keeping with previous studies suggesting that penetrance can be modified by the joint functional effects of regulatory and protein-coding variants. 33 We here provide a key illustration of this mechanism in the context of a recessively-acting hypomorphic variant.
Alongside this, we found that homozygosity for the TYR c.  Table 3).
Our findings also highlight that homozygosity for the haplotype formed by the c.-  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  Table 5). We expect that future studies of visual function and ocular structure in this group of homozygous individuals will provide key insights into the elusive link between RPE melanin synthesis and visual system organisation. 35 Finally, we quantified the risk of albinism associated with combinations of rare and common variants. For each study participant, we estimated two key contributors to an individual's risk. First, we counted the number of rare, presumed Mendelian is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; variants in albinism-associated genes; single nucleotide variants that have MAF<1% and are labelled as disease-causing (DM) in the Human Gene Mutation Database (HGMD) v2021.2 36 were considered. Subsequently, we counted the number of common "risk genotypes" in TYR (i.e. c.-301C, c.575A and/or c.1205A). We found that the presence of >4 common TYR risk genotypes confers an increased risk of albinism even in the absence of a rare, HGMD-listed variant (OR>4; Table 1). We also found that when a single heterozygous HGMD-listed variant co-occurs with >1 common TYR risk genotype, the risk of albinism is increased (OR>4 for rare variants in any albinism-related gene, OR>6 for rare variants in TYR; Table 1). These observations provide a basis for more precise genetic counseling in families with albinism.
In conclusion, we have shown that a significant proportion of albinism risk arises from genetic susceptibility linked to common variants. Our findings also suggest that rare and common protein-coding variation in TYR should be considered in the context of regulatory haplotypes. The concepts discussed here are highly likely to be relevant to the understanding of other rare disorders, and haplotype-based approaches are expected to narrow the diagnostic gap for significant numbers of patients. Future work will embrace more diverse populations and focus on integrating both common and rare variants (including single-nucleotide and copynumber changes) into a single genetic risk score at scale. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; https://doi.org/10.1101/2021.11.01.21265733 doi: medRxiv preprint

Cohort characteristics and genotyping
University Hospital of Bordeaux albinism cohort: Individuals with albinism were identified through the database of the University Hospital of Bordeaux Molecular Genetics Laboratory, France. This is a national reference laboratory that has been performing genetic testing for albinism since 2003 and has been receiving samples from individuals predominantly based in France (or French-administered overseas territories). Information on the dermatological and ophthalmological phenotypes was available and all people included in the study had at least one of the key ocular features of albinism, i.e. nystagmus or absence of a foveal pit (prominent foveal hypoplasia). Only individuals who were not knowingly related were included.
Genotyping, bioinformatic analyses, and clinical interpretation were performed as previously described. 17,26 Briefly, most participants had gene-panel testing of 19 genes associated with albinism (TYR, OCA2, TYRP1, SLC45A2, SLC24A5, C10ORF11, GPR143, HPS 1 to 10, LYST, SLC38A8) using IonTorrent platforms. High-resolution array-CGH (comparative genomic hybridization) was also used to detect copy number variants in these 19 genes. All genetic changes of interest were confirmed with an alternative method (e.g. Sanger sequencing or quantitative PCR). Clinical interpretation of variants was performed using criteria consistent with the 2015 American College of Medical Genetics and Genomics (ACMG) best practice guidelines 37 . Generally, variants with MAF ≥1% in large publicly available datasets (e.g. gnomAD 27 ) were considered unlikely to be diseasecausing. We note that the genetic findings in a subset of this cohort (70%; 845/1208) have been partly reported in a previous publication. 17 Due to the limited number of genes screened in this cohort, it was not possible to reliably assess genetic ancestry and to objectively assign individuals to ancestry groups. Attempting to mitigate this, we processed available data on self-identified ethnicity that were collected through questionnaires.
Responses were inspected and stratification into five broad continental groups (European, African, Admixed American, East Asian, South Asian) was performed. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  41 . The multi-sample VCF was then split into 1,371 roughly equal chunks to allow faster processing and the loci of interest were queried using bcftools v1.9 42 (see https://research-help.genomicsengland.co.uk/display/GERE/ for further information). Only variants that passed all provided site quality control criteria were processed. In addition, we filtered out genotypes with: genotype score <20; read depth <10; allele balance <0.2 and >0.8 for heterozygotes; allele balance >0.1 or <0.9 for homozygotes (reference and alternate, respectively).
Genomic annotation was performed using Ensembl VEP 43 ; one additional annotation was includedpresence of a variant in HGMD v2021.2 36 with a "disease-causing" (DM) label.
Ancestry inference was performed in this cohort using principal component analysis. Data from the 1000 genomes project (phase 3) dataset 38  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; this can be found online at https://researchhelp.genomicsengland.co.uk/display/GERE/Ancestry+inference).
We focused on a pre-determined subset of the Genomics England 100,000 Genomes Project dataset that includes only unrelated probands (n=29,602). 105 of these individuals had a diagnosis of albinism, i.e. the ICD-10 term "Albinism" [E70.3] and/or the HPO terms "Albinism" [HP:0001022], "Partial albinism" [HP:0007443] or "Ocular albinism" [HP:0001107] were assigned. Together with the University Hospital of Bordeaux cases, these 105 probands formed the "case" cohort (for the albinism risk analysis). The remaining 29,497 probands had no recorded diagnosis or phenotypic features of albinism and formed the "control" cohort.

Identifying functional regulatory and protein-coding variants
Regulatory variants: Focusing on TYR, we identified changes that are likely to have an impact on gene regulation by selecting variants that: • are known TYR eQTLs.
• alter TYR cis-regulatory elements, including the promoter of the gene.
To identify eQTLs, we inspected the eQTL catalogue 44 and used data from the Genotype-Tissue Expression v8 (GTEx) 45 and Eye Genotype Expression (EyeGEx) 46 projects. To identify regulatory elements, we used the ENCODE 3 (ENCyclopedia Of DNA Elements phase 3) dataset 28 and inspected chromatin accessibility peaks in RPE samples in DESCARTES (the Developmental Single Cell Atlas of Gene Regulation and Expression) 47 . Extensive search of the biomedical literature (e.g. 29 ) was also performed. All these queries were conducted in January 2021.
Common protein-coding variants: Focusing on TYR, we identified common changes that are likely to have an impact on protein function by selecting variants that: • have a CADD PHRED-scaled score ≥20. CADD is an integrative annotation tool built from more than 60 genomic features. A PHRED-scaled score ≥10 indicates a raw score in the top 10% of all possible single nucleotide variants, while a score ≥20 indicates a raw score in the top 1%. 22 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; https://doi.org/10.1101/2021.11.01.21265733 doi: medRxiv preprint • alter protein-coding sequences -including missense changes, nonsense variants and small insertions/deletions; variants with a potential role on splicing (e.g. synonymous changes and variants altering splice donor/acceptor sites) were not included.
• are included in the following HGMD v2021.2 "mutation type" categories: missense/nonsense, splicing, small deletions, small insertions or small indels; gross deletions, gross insertions/duplications and complex rearrangements were not analysed.

Case-control analysis to estimate albinism risk
The effect of homozygosity for selected TYR haplotypes (formed by one common regulatory change, c.-301C>T, and two common protein-coding variants, c.575C>A and c.1205G>A variants) on albinism risk was estimated using data from the University Hospital of Bordeaux albinism cohort and the Genomics England 100,000 Genomes Project dataset. A case-control analysis of a binary trait (presence/absence of albinism) was conducted assuming a recessive model. There was case-control imbalance and the two cohorts were imperfectly matched, especially in terms of genetic ancestry and genotyping approach used. These sources of bias should be taken into account when interpreting the results, especially findings with low-effect and/or low-confidence signal.
Logistic regression using Firth's bias reduction method 31,32 was used (as implemented in "logistf" R package) 50 . The following covariates were included: gender, number of rare HGMD-listed variants and ancestry (Supplementary Table 2).
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; https://doi.org/10.1101/2021.11.01.21265733 doi: medRxiv preprint

Analysis of visual acuity and foveal thickness in UK Biobank participants
The effect of homozygosity for selected TYR haplotypes was studied in UK Biobank participants. UK Biobank is a biomedical resource containing in-depth genetic and health information from >500,000 individuals from across the UK 12 . A subset of UK Biobank volunteers underwent enhanced phenotyping including visual acuity testing (131,985 individuals) and imaging of the central retina (84,748 individuals) 48 ; the latter was obtained using optical coherence tomography (OCT), a noninvasive imaging test that rapidly generates cross-sectional retinal scans at micrometre-resolution. 49 Notably, only 24 UK Biobank participants are assigned a diagnosis of albinism (data field 41270; ICD-10 term "Albinism" [E70.3]) of which only 7 had visual acuity measurements and none had OCT imaging. Given that reduced visual acuity and increased central retinal thickness (due to underdevelopment of the fovea) are two key hallmark features of albinism we investigated the impact of TYR risk haplotypes on these quantitative endophenotypes.
First, genotyping array data were used to obtain genotypes for TYR c.575C>A [rs1042602] and TYR c.1205G>A [rs1126809] (data field 22418 including information from the Applied Biosystems UK Biobank Axiom Array containing 825,927 markers). In contrast to these two changes, the TYR c.-301C>T [rs4547091] variant was not directly captured by the array. However, high-quality (>99.9%) imputation data on this promoter change were available (data field 22828).
Subsequently, we calculated the mean of the right and left LogMAR visual acuity for each UK Biobank volunteer (data fields 5201 and 5208, "instance 0" datasets). These visual acuity measurements were subsequently used to compare visual performance between groups of people with different homozygous haplotype combinations. As the obtained distributions deviated from normality (Fig.2C), the Kruskal-Wallis test was used. Pair-wise comparisons were performed and the p-values were adjusted using the Benjamini-Hochberg method (Supplementary Table 4).
To obtain central foveal thickness measurements from UK Biobank OCT images, we calculated the mean of the right and left central retinal thickness (defined as the average distance between the hyperreflective bands corresponding to the RPE and the internal limiting membrane (ILM), across the central 1 mm diameter circle of the ETDRS grid) for each UK Biobank volunteer). 51 The obtained measurements were then subsequently used to compare central macular thickness between groups . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; https://doi.org/10.1101/2021.11.01.21265733 doi: medRxiv preprint of UK Biobank volunteers with different homozygous TYR haplotype combinations. As some of the obtained distributions deviated from normality ( Supplementary Fig.2), the Kruskal-Wallis test was used. Pair-wise comparisons were performed and the p-values were adjusted using the Benjamini-Hochberg method (Supplementary Table 5).
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 1, 2021. ; https://doi.org/10.1101/2021.11.01.21265733 doi: medRxiv preprint TABLE   Table 1. Contribution of different classes of albinism-associated variants to disease risk combination of common a and rare b risk genotypes odds ratio c