1. Format
1.1 Filtering LD
Software:
PLINK
Software download:
https://www.cog-genomics.org/plink
Reference script:
--double-id --indep-pairwise 50 5 0.5 --maf 0.01 --geno 0.2 --allow-extra-chr
1.2 Extracting SNPs
Software:
PLINK
Software download:
https://www.cog-genomics.org/plink
Reference script:
--double-id --extract your_file --recode vcf-iid --allow-extra-chr
1.3 Convert VCF file to .geno format
Software:
vcf2geno
Software download:
https://github.com/zhanxw/vcf2geno
Reference script:
"$vcf2geno" "$input_file" your_output_file_name
2. SNMF Analysis
2.1 Structure Analysis
Software:
sNMF
Software download:
http://membres-timc.imag.fr/Olivier.Francois/snmf/index.htm
Reference script:
in_geno=The .in.geno file path generated in the previous step prefix=Specifying a prefix for K in $(seq 1 Desired k value) do mkdir -p ${K} cd ${K} for r in $(seq 1 Desired r value); do q_file=${prefix}.K${K}r${r}.Q g_file=${prefix}.K${K}r${r}.G seed=$(expr $[${r}*1000]) log_file=${prefix}.K${K}r${r}.log echo K=${K}; \ "$sNMF" \ -x ${in_geno} \ -K ${K} \ -c \ -q ${q_file} \ -g ${g_file} \ -s ${seed} \ -p 12 \ > ./${log_file} done cd .. done grep "Cross-Entropy (masked data):" ./*/*.log > ./Cross-Entropy.txt
2.2 Analyze Q matrix file
Software:
Pong
Software download:
Reference script:
pong -m "your file mapping table path" -l your_color_scheme_file_path --greedy -s 0.9