View My GitHub Profile
View GEM GitHub
View GEM-Workflow GitHub
View MAGEE GitHub
View MAGEE-Workflow GitHub
Home | GEM Showcase Workspace | GEM Auxiliary Tools | References |
We have created a showcase workspace that’s featured on the Terra platform to illustrate an entire workflow from VCF file to summary plots. The workspace uses genotype data from 1000 Genomes Project, which is publicly available on Terra.
GEM Showcase Workspace on Terra
Running GEM
At minimum, GEM requires the following parameters to have an input value:
./GEM --bgen example.bgen \
--pheno-file example.pheno \
--sampleid-name sampleid \
--pheno-name pheno2 \
--pheno-type 1 \
--exposure-names cov1
Exposures
Multiple exposures can be included for testing by passing the exposure names separated by a single space. GEM will then output coefficient estimates and variances associated with each exposure term.
./GEM --bgen example.bgen --pheno-file example.pheno --sampleid-name sampleid --pheno-name pheno2 --pheno-type 1 \
--exposure-names cov1 cov2 cov3
Covariates
Covariates can be adjusted for by using the –covar-names parameter. To include multiple covariates, pass the covariate names separated by a single space like below.
./GEM --bgen example.bgen --pheno-file example.pheno --sampleid-name sampleid --pheno-name pheno2 --pheno-type 1 \
--exposure-names cov1 \
--covar-names cov2 cov3
Interaction Covariates
Interaction covariates can also be adjusted for using the –int-covar-names parameter. To include multiple interaction covariates, pass the interaction covariate names separated by a single space.
./GEM --bgen example.bgen --pheno-file example.pheno --sampleid-name sampleid --pheno-name pheno2 --pheno-type 1 \
--exposure-names cov1 \
--int-covar-names cov2 cov3
Both the covariates and interaction covariates can be included in the model for adjusting.
./GEM --bgen example.bgen --pheno-file example.pheno --sampleid-name sampleid --pheno-name pheno2 --pheno-type 1 \
--exposure-names cov1 \
--covar-names cov2 \
--int-covar-names cov3
Multithreading
GEM can perform multithreading by using the –threads parameter. By default, GEM will use half the number of logical cores/ threads detected. Each thread will get roughly an equal amount of variants for association testing.
We recommend using the total number of physical cores / processing units for optimize performance.
./GEM --bgen example.bgen --pheno-file example.pheno --sampleid-name sampleid --pheno-name pheno2 --pheno-type 1 \
--exposure-names cov1 \
--covar-names cov2 \
--int-covar-names cov3 \
--threads 4
Sample size changes from N to 0
This error occurs when GEM cannot match the sample identifiers in the genotype file to sample identifiers in the phenotype file.
BGEN files
For BGEN files, the sample identifier matching process happens between the phenotype file and the sample identifier block within the BGEN file (see here) OR the phenotype file and an external .sample file (see here).
Solutions:
We have generally come across this error when using PLINK 2.0 to convert different genotype file formats (vcf/bed/pgen) to bgen format. As shown in the link, PLINK 2.0 by default construct sample IDs as “id-paste=maybefid,iid,maybesid”.
This can be solved by passing “id-paste=iid” instead as shown below.
./plink2.exe --vcf my_vcf_file.vcf --export 'bgen-1.2' 'id-paste=iid' --out my_bgen_file