1. Format


1.1 Filtering LD


Software:

PLINK

Software download:

https://www.cog-genomics.org/plink

Reference script:

--double-id --indep-pairwise 50 5 0.5 --maf 0.01 --geno 0.2 --allow-extra-chr   
                

1.2 Extracting SNPs


Software:

PLINK

Software download:

https://www.cog-genomics.org/plink

Reference script:


--double-id --extract your_file --recode vcf-iid  --allow-extra-chr

1.3 Convert VCF file to .geno format


Software:

vcf2geno

Software download:

https://github.com/zhanxw/vcf2geno

Reference script:

"$vcf2geno" "$input_file" your_output_file_name

                

2. SNMF Analysis


2.1 Structure Analysis


Software:

sNMF

Software download:

http://membres-timc.imag.fr/Olivier.Francois/snmf/index.htm

Reference script:

in_geno=The .in.geno file path generated in the previous step

prefix=Specifying a prefix
for K in $(seq 1 Desired k value) 
	do
		mkdir -p ${K}
		cd ${K}

	for r in  $(seq 1 Desired r value);
		do
			q_file=${prefix}.K${K}r${r}.Q
			g_file=${prefix}.K${K}r${r}.G
			seed=$(expr $[${r}*1000])
			log_file=${prefix}.K${K}r${r}.log

			echo K=${K}; \
			"$sNMF" \
			-x ${in_geno} \
			-K ${K} \
			-c \
			-q ${q_file} \
			-g ${g_file} \
			-s ${seed} \
			-p 12 \
			> ./${log_file}
		done
	cd ..
	done

grep "Cross-Entropy (masked data):" ./*/*.log > ./Cross-Entropy.txt

                

2.2 Analyze Q matrix file


Software:

Pong

Software download:

https://github.com/abehr/pong

Reference script:


pong -m "your file mapping table path" -l your_color_scheme_file_path --greedy -s 0.9