• Defining the role of genetic factors in the etiology of complex diseases and traits, such as cancer, heart disease and diabetes, across racial and ethnic populations
  • Developing new, large multiethnic population-based resources for genetic epidemiologic research, including large-scale genome-wide association studies (GWAS) and next-generation sequencing studies for cancer and other complex traits
  • Developing and applying novel statistical methods for genetic research in diverse multiethnic populations and for the integration of multi- ‘omics’ data (e.g., germline variation, microbiome, DNA methylation, gene expression, protein levels, metabolomics) to better understand biological pathways involved in disease and possible targets for treatment and prevention
  • Investigating the evolutionary forces that shaped the genetic architecture of complex traits within and between populations, including understanding the impact of population demography and natural selection on health disparity between populations and tracing the evolutionary origin of risk alleles identified through human genetics studies

Studies and Consortia

Human genetic research in the 21st century is highly collaborative, and members of the Centers are among the leaders of these international efforts. Specifically, the center’s faculty are active members of several large studies and consortia that have been assembled to facilitate investigation into the genetic basis of disease. Many of these efforts are focused on multi-ethnic minority populations and lessening the health disparity experienced by these populations. Examples of such research include:

  • AAMMS was established in collaboration with 11 national cancer centers and four National Cancer Institute SEER registries to elucidate causes of multiple myeloma in African Americans, the highest risk group in the world. AAMMS has demographic, risk factor, detailed clinical phenotype and molecular data, serum, plasma, DNA and tissue samples from 1,810 African American multiple myeloma patients, the largest such collection in the world. GWAS has been conducted and genotypes are available, as well as imputed human leukocyte antigen types. Tumor tissue collection is ongoing.


  • Center members are leading GWAS, fine-mapping and polygenic risk score analyses of breast cancer among women of African Ancestry.


  • CTP is one of the largest population-based twin registries in the world, with 50,000 participating twins born in California from 1918 to 1982. The program is regularly linked to the California Cancer Registry and USC CSP to obtain cancer diagnoses (more than 1,420 twins with 1,596 breast cancers, 1,675 twins with prostate cancer and 1,320 twins with melanoma). Serum, DNA, environmental samples, mammographic density, immunologic measures and tumor blocks are available from subsets of twins.


  • CIRCLE is a National Institute of Environmental Health Sciences/U.S. Environmental Protection Agency–funded program project focused on environmental risk factors for childhood leukemia and is a collaboration between USC, the University of California, Berkeley and Yale University. The Center for Genetic Epidemiology plays a major role in immunotoxicology, genetic and epigenetic projects within CIRCLE.

    Click here for more information.

  • CCRLP involves linking several California-based databases to study all cancer types in children 0 to 19 years of age between 1988 and 2015 for which biological samples are available. CALSEC is a similar linkage covering cancer cases up to age 35 and up to the year 2015. These repositories form a basis for several cancer studies between USC, the University of California, Berkeley and Yale University.


  • The goal of the FinMetSeq consortium is to leverage the unique population bottleneck in Northern and Eastern Finland for increased power to map rare alleles (< 0.005%) associated with quantitative cardiometabolic traits. The consortium has generated more than 20,000 whole-exome sequences and is in the process of generating thousands of whole-genome sequences. This is a collaboration between Center for Genetic Epidemiology investigators and investigators at the Institute for Molecular Medicine Finland, McDonnell Genome Institute at Washington University in St. Louis, University of Michigan and University of California, Los Angeles.


  • IMAGE’s goal is to develop novel statistical methods to address some of the major problems facing cancer genetic epidemiologists in the post-GWAS era and to illustrate their use for discovery of novel biology in various colorectal cancer studies. These methods:

    • Leverage prior biological knowledge to inform integrative genomic analyses (project 1)
    • Use phylogenetic information to infer gene function as inputs to our epidemiologic modeling projects (project 2)
    • Model the role of the microbiome and the exposome in cancer risk (project 3)
    • Exploit intratumor heterogeneity to learn about somatic tumor evolution and how this process is modified by the internal environment (project 4)

    These four projects are supported by an administrative core and three shared resource cores on functional annotation, high performance computing and software development and distribution. The entire program is motivated by an overall objective of providing tools for evaluating the impact of potential preventive or therapeutic interventions based on modifiable risk factors.

  • MEC is a large epidemiologic study that follows more than 215,000 residents of Hawaii and Los Angeles for development of cancer and other chronic diseases. It includes men and women of five main ethnic groups: Japanese Americans, Native Hawaiians, African Americans, Latinos and whites, with genome-wide association studies (GWAS) and biospecimens available for more than 70,000 individuals. More than 16,000 MEC individuals will also be whole-genome sequenced as part of the National Human Genome Research Institute’s Centers for Common Disease Genomics program.

    Click here for more information.

  • PAGE includes genomics data from more than 120,000 diverse individuals from six well-characterized cohorts/biobanks. The goals of PAGE are to:

    • Identify genetic variants that influence complex traits and diseases in ancestrally diverse individuals using both whole-genome sequence data and GWAS data
    • Integrate information on sequence variation and “-omics” to better understand the genetic underpinnings of complex traits in the diverse PAGE participants
    • Characterize biological pathways underlying disease risk both within and between populations

    Click here for more information.

  • As part of these consortia, center members are leading a number of GWAS and exome sequencing studies of prostate cancer and aggressive phenotypes across racial and ethnic populations in the U.S. and globally.


  • The ReCord Study will conduct the largest, most comprehensive backtracking study of childhood leukemia. ReCord will collect leukemia samples and stored cord blood for several hundred children with acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML).  ReCord will then find out which leukemia mutations start before birth and if leukemia mutations found in cord blood are related to known risk factors for childhood leukemia.

    Click here for more information.

  • SPARK, a large-scale autism genetics research study with over 100,000 autistic individuals and 175,000 family members participating, spans multiple sites across the U.S. The study’s mission is to uncover autism’s causes and foster the development of more effective treatments and supports. SPARK utilizes a variety of genome sequencing technologies, including whole-exome sequencing, whole-genome sequencing, and genotyping arrays, for both individuals with autism and their family members. Beyond genetic data, the study also integrates medical histories, as well as social and behavioral assessments of its participants.

    Click here for more information.

  • The RESPOND study is one of the largest studies ever to look at the underlying factors that put African American men at higher risk for prostate cancer. Over the next five years, 10,000 African American men with prostate cancer will be recruited.

    Click here for more information.

  • The TOPMed program, which is supported by the National Heart, Lung, and Blood Institute, has collected whole-genome sequencing and other “-omics” data, including methylation, proteomics measurements, metabolites and RNA among diverse ethnic groups. Several broadly phenotyped epidemiologic studies contribute to the program, which integrates “-omics” data with molecular, behavioral, imaging, environmental and clinical data to improve the prevention and treatment of heart, lung, blood and sleep disorders. Center for Genetic Epidemiology faculty are currently actively involved in lung, hematology and inflammatory biomarkers working groups.

    Click here for more information.

  • RTR is a tissue bank of donated formalin-fixed, paraffin-embedded tissue blocks from more than 75,000 patients diagnosed throughout Los Angeles County, a subset of the population covered by the SEER Cancer Registry for Los Angeles County (USC Cancer Surveillance Program [CSP]). Individual patient data, including demographics, clinical data and survival, is obtained by linkage to the USC CSP and the California Cancer Registry. RTR currently contains tumor blocks from more than 17,300 Hispanic, 10,900 African American, 7,600 Asian and 39,000 non-Hispanic white cancer patients.


Statistical Software

Advances in human genetic research are fueled by advancements in technology and methodologies. To this end, the center’s faculty have developed analytical framework to address the biological, epidemiological, statistical, or evolutionary questions encountered in human genetics. In particular, the center’s faculty have developed the following statistical software that is commonly used in genetic epidemiology studies:

  • BVS is an R package that focuses on analyzing case-control association studies involving a group of genetic variants. The main focus is to model the outcome variable as a function of a multivariate genetic profile using Bayesian model uncertainty and variable selection techniques. The package allows for numerous genetic predictors to be modeled either jointly as main effects, in combination as expected haplotypes, conditional on the current SNPs selected in a model and the ability to model rare variants via the Bayesian Risk Index. Most notably, the package allows for the incorporation of external biological information via a set of specified prior covariates to inform the marginal inclusion probabilities.


  • eGRM is a genetic relationship matrix constructed based on genome-wide genealogies inferred from genetic data. Conditioning on the genealogies, the eGRM can reveal more details of finer-scale population structure. Published in Fan et al. AJHG 2022 (PMID: 35417677)

    Click here to learn more.

  • FIZI leverages functional information together with reference linkage-disequilibrium (LD) to impute GWAS summary statistics (Z-scores).


  • FOCUS is software to fine-map transcriptome-wide association study statistics at genomic risk regions. The software takes as input summary GWAS data along with eQTL weights and outputs a credible set of genes to explain observed genomic risk.

    Click here to learn more.

  • JAM is a scalable algorithm for joint analysis of marginal summary statistics for the re-analysis of published marginal summary statistics under joint multi–single nucleotide polymorphism (SNP) models. The correlation is accounted for according to estimates from a reference data set and models. SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework.

    Click here to learn more.

  • gLike performs demographic inference and estimate parameters of a demographic model based on genealogical trees. Published in Fan et al. bioRxiv 2023 (PMID: 37873208)

    Click here to learn more.

  • LUCID is an integrative model to estimate latent unknown clusters, aiming to both distinguish unique genomic, exposure and informative biomarkers or “-omic” effects while jointly estimating subgroups relevant to the outcome of interest.

    Click here to learn more.

  • PriorityPruner can prune a list of SNPs that are in high linkage disequilibrium (LD) with other SNPs in the list, while preferentially keeping/selecting SNPs of higher priority (e.g., the most significant SNPs in a GWAS). A user can input data in PLINK format with corresponding SNP annotation, including p-values and other SNP characteristics used for prioritization.

    PriorityPruner iterates over the entire list of inputted SNPs, in order of descending priority (e.g., lowest to highest p-value), to select LD-independent SNPs according to customizable options and thresholds.

    Click here to learn more.

  • RHOGE is an R package that estimates the genome-wide genetic correlation between two complex traits (diseases) as a function of predicted gene expression effect on trait (ρge). Given output from two transcriptome-wide association studies, RHOGE estimates the mediating effect of predicted gene expression and estimates the correlation of effect sizes across traits (diseases). This approach is extended to a bidirectional regression that provides putative causal directions between traits with non-zero ρge.

    Click here to learn more.

  • TWAS Simulator is software to simulate a complex trait as a function of latent steady-state expression, fit eQTL weights in independent data, and perform GWAS+TWAS on the simulated complex trait.

    Click here to learn more.