1. Format
1.1 Filtering LD
Software:
PLINK
Software download:
https://www.cog-genomics.org/plink
Reference script:
--double-id --indep-pairwise 50 5 0.5 --maf 0.01 --geno 0.2 --allow-extra-chr
1.2 Extracting SNPs
Software:
PLINK
Software download:
https://www.cog-genomics.org/plink
Reference script:
--double-id --extract your_file --recode vcf-iid --allow-extra-chr
1.3 Convert VCF file to .geno format
Software:
vcf2geno
Software download:
https://github.com/zhanxw/vcf2geno
Reference script:
"$vcf2geno" "$input_file" your_output_file_name
2. SNMF Analysis
2.1 Structure Analysis
Software:
sNMF
Software download:
http://membres-timc.imag.fr/Olivier.Francois/snmf/index.htm
Reference script:
in_geno=The .in.geno file path generated in the previous step
prefix=Specifying a prefix
for K in $(seq 1 Desired k value)
do
mkdir -p ${K}
cd ${K}
for r in $(seq 1 Desired r value);
do
q_file=${prefix}.K${K}r${r}.Q
g_file=${prefix}.K${K}r${r}.G
seed=$(expr $[${r}*1000])
log_file=${prefix}.K${K}r${r}.log
echo K=${K}; \
"$sNMF" \
-x ${in_geno} \
-K ${K} \
-c \
-q ${q_file} \
-g ${g_file} \
-s ${seed} \
-p 12 \
> ./${log_file}
done
cd ..
done
grep "Cross-Entropy (masked data):" ./*/*.log > ./Cross-Entropy.txt
2.2 Analyze Q matrix file
Software:
Pong
Software download:
Reference script:
pong -m "your file mapping table path" -l your_color_scheme_file_path --greedy -s 0.9