Code Monkey home page Code Monkey logo

salp_anon_to_salp's Introduction

Arctic Charr anonymous marker placement on genetic map

B. Sutherland
2017-07-20
This methods description accompanies the results presented in the following manuscript:
Moore J.-S., Harris L. N., Le Luyer J., Sutherland B. J. G., Rougemont Q., Tallman R. F., Fisk A. T., Bernatchez L., 2017 Migration harshness drives habitat choice and local adaptation in anadromous Arctic Char: evidence from integrating population genomics and acoustic telemetry. bioRxiv: 1–39. doi: https://doi.org/10.1101/138545

Overview

Part 1 will anchor anonymous markers onto a high-density genetic map of Salvelinus alpinus (Nugent et al. 2017)
Part 2 will combine positioned markers with population genetic values (e.g. Fst) and plot in a GWAS figure

Clone this repo, run all code from within the main repo

Part 1: Position anonymous markers

A) Obtain input data and put in 02_data

From: Figshare data

  • Salp anon marker sequence file: salp_tags.csv

From: Supplemental Data from Nugent et al. 2017, G3

  • Salp map marker file: FileS1.xlsx
  • Salp map position file: FileS2.xlsx

i) Within FileS1.xlsx, save the sheet 'Map_SNPs' as a .csv entitled FileS1.csv

ii) Collect marker name and sequence from this file:
grep -v 'Polymorphism' FileS1.csv | awk -F, '{ print $1 "," $4 }' > salp_marker_and_seq.csv

iii) Within FileS2.xlsx, save the only sheet as a .csv file entitled FileS2.csv

iv) Collect only the lines with female linkage groups that have markers with positions (and a little formatting):
awk -F, '{ print $1","$3","$4 }' FileS2.csv | sed 's/,AC-/,AC/g' | sed 's/,-/,empty/g' | grep -vE 'NA|empty' - | grep -v 'Marker,Female,Map' - > ./salp_female_map.csv

v) Same as above, but collect the male map:
awk -F, '{ print $1","$2","$4 }' FileS2.csv | sed 's/,AC-/,AC/g' | sed 's/,-/,empty/g' | grep -vE 'NA|empty|UNA' - | grep -v 'Marker,Male,Map' | sed 's/m\,/\,/g' | sed 's/\-\,/\,/g' > ./salp_male_map.csv

vi) Finish preparing the data using R (i.e. formats, make AC-20 and AC-4 continuous naming and cumulative position instead of LG arm split).
I suggest opening the following script in RStudio, setting working directory to this repo. Will have to run twice, changing setting from female to male to get both maps.
01_scripts/salp_collect_information.R

This will produce:
salp_female_merged_sorted_clean.csv (1656 records)
salp_male_merged_sorted_clean.csv (1489 records)

Create a consensus file
cat salp_male_merged_sorted_clean.csv salp_female_merged_sorted_clean.csv > consensus_merged_sorted_clean.csv

B) Format data for MapComp

i) Move to the data folder, and replace anonymous title ‘alltags’ with ‘Salp.anon’. Also for anonymous markers make the LG 0 variable all equal LG 1.
sed 's/alltags/Salp.anon/g' salp_tags.csv | sed 's/anon,0/anon,1/g' > salp.anon_markers.csv

ii) Confirm information on MapComp input files
wc -l salp.anon_markers.csv consensus_merged_sorted_clean.csv
3145 consensus_merged_sorted_clean.csv (mapped markers)
6230 salp.anon_markers.csv (anonymous markers)

iii) Combine all markers to make input for MapComp
cat salp.anon_markers.csv consensus_merged_sorted_clean.csv > salp.anon_salp.map.csv

iv) Move back up to the main folder, and clone in the MapComp repo
cd ../../
git clone https://github.com/enormandeau/mapcomp.git

v) Move into the MapComp repo
cd mapcomp

vi) Follow instructions given at the top of the MapComp iterative script:
01_scripts/utility_scripts/remove_paired_anon_and_pair_again.sh

To obtain results as in the manuscript, run MapComp iterative with 10 iterations, with the default distance setting, and use the Atlantic Salmon reference genome as the genome intermediate:
ICSASG_v2
From: Lien et al., 2016. The Atlantic Salmon genome provides insights into rediploidization. Nature 533: 200–205.

C. Prepare and run MapComp iteratively

i) Copy the combined output from above into the mapcomp/02_data folder
cp ./../02_data/salp.anon_salp.map.csv ./02_data/markers.csv

ii) Prepare the marker.csv file to a fasta file ./01_scripts/00_prepare_input_fasta_file_from_csv.sh ./02_data/markers.csv

iii) Check the markers.fasta grep -c '>' 02_data/markers.fasta 9375 02_data/markers.fasta

iv) Prepare MapComp variables and parameters Using vi, or similar, set the species name in the iterative mapping script
e.g. ANON=”Salp.anon”
vi ./01_scripts/utility_scripts/remove_paired_anon_and_pair_again.sh

Set the max distance in the mapcomp script (e.g. 1000000 or 10000000) vi ./mapcomp

Set the path to the genome file in both the following:
vi ./mapcomp
vi 01_scripts/01_bwa_align_reads.sh

v) Run MapComp iteratively ./01_scripts/utility_scripts/remove_paired_anon_and_pair_again.sh

vi) Collect results, this will be used in Part 2.
awk '{ print $1","$5","$11 }' 03_mapped/pairings_out.txt > 03_mapped/Salp_mname_Salptotpos.csv

vii) Copy the result file from the previous step into the folder salp_anon_to_salp/02_data

Part 2: Combine with population genetic values and plot

i) Open the script entitled 01_scripts/GWAS_from_MapComp_2016-11-02.R in R and follow instructions there. This will connect the test statistics from the analysis to the positional data of the markers, and will plot across the linkage map in delineated LGs. This will produce Figure 7 from the associated manuscript.

This will require the following inputs:

  • Index file (matches Index to identifier)
  • Position info (uses identifier; output from Part 1 Salp_mname_Salptotpos_consensus.csv)
  • Results for test statistics (uses Index; bayescan, PCAdapt, LFMM)
  • Map file (output from Part 1 consensus_merged_sorted_clean.csv)

salp_anon_to_salp's People

Contributors

bensutherland avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.