When I work with two close relatives, it is interesting which genes are homologous and how chromosomal structures are conserved among those relatives. Synteny plot is useful to visualize how to compare multiple genomic sequences. MCScan is a tool I would like to introduce here. As an example I used Solanum lycopersicum and Solanum pennellii, which had been used in my lab.

Preparation

Donwload genome fasta files and gff files.

Install last

cd /usr/local/bin
sudo wget https://gitlab.com/mcfrith/last/-/archive/1584/last-1584.tar.gz
sudo tar -xvzf last-1584.tar.gz
cd last-1584
sudo make
cd /usr/local/bin
sudo ln -s sudo ln -s last-1584/bin/* ./

Now install JCVI (which inlcudes MCscan)

conda create -n jcvi python=3.10 scipy # as of 12/2024, version 3.13 does not allow installation to complete.
conda activate jcvi
pip install jcvi
brew install texlive #note: conda version of texlive is broken

activate jcvi

cd /Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/input
conda activate jcvi

format fasta files

python -m jcvi.formats.fasta format /Volumes/data_work/Data8/NGS_related/Sinha_lab/ITAG4.0_release/ITAG4.0_proteins.fasta SLitag4_0.pep

python -m jcvi.formats.fasta format /Volumes/data_work/Data8/NGS_related/Sinha_lab/Solanum_pennellii_v2/annotations/Spenn-v2-aa-annot.fa SPenn.pep

Convert GFF to BED (genome)

# Convert GFF to BED (genome)
#python -m jcvi.formats.gff bed --type=mRNA --key=Name /Volumes/data_work/Data8/NGS_related/Sinha_lab/ITAG4.0_release/ITAG4.0_gene_models.gff -o SLitag4_0.bed
python -m jcvi.formats.gff bed --type=mRNA --key=Name --primary_only /Volumes/data_work/Data8/NGS_related/Sinha_lab/ITAG4.0_release/ITAG4.0_gene_models.gff -o SLitag4_0.bed
python -m jcvi.formats.gff bed --type=mRNA --key=Name --primary_only /Volumes/data_work/Data8/NGS_related/Sinha_lab/Solanum_pennellii_v2/spenn_v2.0_gene_models_annot.gff -o SPenn.bed

Modify bed file for working orthologw

If Solyc00g500001.1.1 in fasta file, bed file has to use Solyc00g500001.1

(Do not) To remove .* at the end in fasta file gene name to match gene names in both files.

(Do) To remove .* at the endo in bed file gene name.

grep "\.5\.1" SLitag4_0.bed # specific to *.5.1.1
head SLitag4_0.bed  # others
# format
sed -i '' -E 's/(Solyc[0-9]+g[0-9]+\.[0-9]+)\.[0-9]/\1/' SLitag4_0.bed 
head SLitag4_0.bed
grep "\.5\.1" SLitag4_0.bed
head SLitag4_0.bed 

# Explanation gy ChatGPT
# Explanation of Parts:
#sed: The stream editor used for text transformations.
#
#-i '':
#
#The -i flag means "edit the file in place."
#The empty '' argument is specific to macOS sed and is used to indicate no backup file should be created. On Linux, you could use -i without the empty quotes or specify a backup suffix like -i.bak.
#-E: Enables extended regular expressions (ERE), which allow more advanced pattern matching than basic sed syntax.

#'s/.../.../':

#This is the sed substitution command. The syntax is:
#s/<pattern>/<replacement>/
#s/ indicates a substitution.
#The first part (<pattern>) defines what to search for.
#The second part (<replacement>) defines what to replace it with.
#Pattern Explanation ((Solyc[0-9]+g[0-9]+\.[0-9]+)\.[0-9]):

#Solyc: Matches the literal string Solyc.
#[0-9]+: Matches one or more digits (representing chromosome or gene numbers).
#g: Matches the literal character g.
#[0-9]+: Matches one or more digits (representing additional gene details).
#\.: Matches a literal period (.).
#[0-9]+: Matches one or more digits (possibly a version or sub-feature number).
#\.[0-9]: Matches another period followed by a single digit. This is the part you want to remove.
#( ... ): Groups part of the pattern so it can be referenced in the replacement using \1.
#Replacement (\1):

#\1 refers to the first group captured in parentheses ((Solyc[0-9]+g[0-9]+\.[0-9]+)).
#This keeps the matched part before the second .[0-9] while excluding the trailing .X.
#File (SLitag4_0.bed):

#This is the input file being edited in place.
#What It Does:
#This command removes the trailing .X (where X is a single digit) from strings that match the pattern Solyc<number>g<number>.<number>.<number> in the file SLitag4_0.bed. The modified file is saved in place without creating a backup.

# For SPenn.bed
head SPenn.bed > SPenn.test.bed
sed -i '' -E 's/(Sopen[0-9]+g[0-9]+)\.[0-9]$/\1/' SPenn.test.bed 
cat SPenn.test.bed
# real
sed -i '' -E 's/(Sopen[0-9]+g[0-9]+)\.[0-9]+/\1/' SPenn.bed 
cat SPenn.bed

find orthologs

cd ../../output/
ln -s ../input/* ./

python -m jcvi.compara.catalog ortholog --dbtype prot SLitag4_0 SPenn

#You can use awk to remove the “-” from the first column (chromosome name) and save the result. This is to have simple chromosome name in the synteny plot.

## for M82
awk '{gsub("SL4.0", "", $1); print}' SLitag4_0.bed > SLitag4_0_cleaned.bed
## for SPenn
awk '{gsub("Spenn-", "", $1); print}' SPenn.bed > SPenn_cleaned.bed

change space to “ in .bed using BBE editor

Write layout file

write_file(
"# y, xstart, xend, rotation, color, label, va,  bed
 .6,     .1,    .8,       0,   m, M82, top, SLitag4_0_cleaned.bed
 .4,     .1,    .8,       0,   k, SPenn, top, SPenn_cleaned.bed
# edges
e, 0, 1, SLitag4.0.SPenn.anchors.simple
",
file="/Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/output/layout3")

Set chromosomes to plot and order.

This chromosome names has to be matched to the bed files described in layout file and each line correspond to each genotype (i.e. two genotypes needs two lines).

# M82 chromosome name
M82_ids <- read_delim("/Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/output/SLitag4_0_cleaned.bed",col_names = FALSE) %>% count(X1) %>% dplyr::select(X1) %>% as.vector()
# SPenn chromosome name
SPenn_ids <- read_delim("/Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/output/SPenn_cleaned.bed",col_names = FALSE) %>% count(X1) %>% dplyr::select(X1)  %>% as.vector()  
# combine them
seqIDs <- list(str_c(M82_ids$X1,collapse=","),
               str_c(SPenn_ids$X1,collapse=",")
               )
# write two lines
write_lines(seqIDs, "/Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/output/seqids4") 

Plot

python -m jcvi.graphics.karyotype --outfile S.lycoM82_S.Penn_karyotype2.pdf seqids4 layout3
Karyotype
Karyotype

Microsynteny

Let’s move to local synteny.

python -m jcvi.compara.synteny mcscan SLitag4_0_cleaned.bed SLitag4_0.SPenn.lifted.anchors --iter=1 -o SLitag4_0.SPenn.i1.blocks

Select only chromosome 6 in M82

grep "Solyc06g" SLitag4_0.SPenn.i1.blocks > blocks.M82chr06    

write block layout

write_file(
"# x,   y, rotation,   ha,     va,   color, ratio,            label
0.5, 0.6,        0, left, center,       m,     1,       M82 Chr6
0.5, 0.4,        0, left, center, #fc8d62,     1, Penn
# edges
e, 0, 1",
file="/Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/output/blocks1.layout")

microsynteny plot (M82 chromosome 6)

python -m jcvi.formats.bed merge SLitag4_0_cleaned.bed SPenn_cleaned.bed -o SLitag4_0_SPenn.bed
python -m jcvi.graphics.synteny blocks.M82chr06 SLitag4_0_SPenn.bed blocks1.layout   --outputprefix S.lycoM82_blocks.M82chr06

label “Solyc06g052070” ~ “Solyc06g072930”

python -m jcvi.graphics.synteny blocks.M82chr06 SLitag4_0_SPenn.bed blocks1.layout   --outputprefix S.lycoM82_blocks.M82chr06.v2 --genelabelsize=10 --genelabels=Solyc06g052070.3,Solyc06g072930.4

how about SPenn chr06?

Make SPenn blocks

python -m jcvi.compara.synteny mcscan SPenn_cleaned.bed SLitag4_0.SPenn.lifted.anchors --iter=1 -o SPenn_SLitag4_0.i1.blocks

select only S. Penn chromosome 6

grep "Sopen06g" SPenn_SLitag4_0.i1.blocks > blocks.SPennchr06 

write block layout

write_file(
"# x,   y, rotation,   ha,     va,   color, ratio,            label
0.5, 0.6,        0, left, center,       m,     1,       M82
0.5, 0.4,        0, left, center, #fc8d62,     1, Penn Chr6
# edges
e, 0, 1",
file="/Volumes/data_work/data8/NGS_related/Sinha_lab/Sinteny_MCscan/output/blocks.SpennChr06.layout")

microsynteny plot (SPenn chromosome 6)

python -m jcvi.graphics.synteny blocks.SPennchr06 SLitag4_0_SPenn.bed blocks.SpennChr06.layout   --outputprefix S.lycoM82_blocks.SPennchr06
M82 chr06 synteny
M82 chr06 synteny

visualization of synteny plot by JBrowse2

Install JBrowse2 app on a computer

Import .anchor.simple file together with .bed files

JBrowse2 synteny
JBrowse2 synteny
sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
##  [5] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
##  [9] ggplot2_3.5.1   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5      jsonlite_1.8.9    compiler_4.4.0    tidyselect_1.2.1 
##  [5] jquerylib_0.1.4   scales_1.3.0      yaml_2.3.10       fastmap_1.2.0    
##  [9] R6_2.5.1          generics_0.1.3    knitr_1.48        bookdown_0.41    
## [13] munsell_0.5.1     tzdb_0.4.0        bslib_0.8.0       pillar_1.9.0     
## [17] rlang_1.1.4       utf8_1.2.4        stringi_1.8.4     cachem_1.1.0     
## [21] xfun_0.49         sass_0.4.9        timechange_0.3.0  cli_3.6.3        
## [25] withr_3.0.1       magrittr_2.0.3    digest_0.6.37     grid_4.4.0       
## [29] rstudioapi_0.16.0 hms_1.1.3         lifecycle_1.0.4   vctrs_0.6.5      
## [33] evaluate_1.0.0    glue_1.7.0        blogdown_1.19     fansi_1.0.6      
## [37] colorspace_2.1-1  rmarkdown_2.28    tools_4.4.0       pkgconfig_2.0.3  
## [41] htmltools_0.5.8.1