{\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf330
{\fonttbl\f0\froman\fcharset77 TimesNewRomanPSMT;\f1\froman\fcharset77 TimesNewRomanPS-BoldMT;\f2\fmodern\fcharset77 Courier;
\f3\froman\fcharset77 Times-Roman;\f4\fnil\fcharset77 LucidaGrande;\f5\froman\fcharset77 TimesNewRomanPS-ItalicMT;
}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww13980\viewh13700\viewkind0
\pard\tx565\tx1133\tx1700\tx2266\tx2832\tx3401\tx3967\tx4535\tx5102\tx5669\tx6235\tx6802\ql\qnatural
\f0\fs28 \cf0 This ridicously detailed guide to conducting a parametric bootstrap analysis (e.g., SOWH test), was written by Andrew J. Crawford in the middle of 2005. I hope the instructions make sense to you. If you have questions or complaints, please email me at: andrew@dna.ac\
\
These instructions were written with Mac OSX and Darwin Unix in mind, but much of this guide should be generally applicable to other OS's. \
\
\
OVERVIEW:\
\
First, you need some a priori hypotheses of topological relationships. I recommend you think carefully about this. Also, present your hypotheses to a critical friend first. Discuss them. I recommend constraining the minimum number of nodes possible under your a priori hypotheses. If you constrain more nodes than absolutely necessary, you will (a) more likely reject your hypothesis, but (b) possibly reject it for the wrong (unintended/uninteresting) reasons. Also, consider alternative outcomes and how you might interpret them. Consider the undesirable but possibly necessary option of removing taxa if they're inclusion precludes clear alternative a priori topological hypotheses (constraints) AND they have no bearing the hypotheses under consideration.\
\
Second, of course you need a ML tree. Do a good search so that you are pretty sure you got the best tree. If you have more than one equally likely tree, I don't know what you'd do. Maybe you'd make a consensus of the best trees and use that as your best/optimal tree topology. Your best tree = H1.\
\
Third, you need to find the best trees possibly under your constraint. Your constraint is your a priori hypothesis. Your best constrained tree = H0. From this you calculate the difference, D, in tree score between your H0 and H1. This is your test statistic. \
\
Forth, you need to simulate 100+ data sets. Simulations will be based on your H0 topology (i.e., constrained-ML tree) and the parameter values most appropriate for that constrained tree. Do not use the parameter values obtained for your unconstrained (H1) ML tree. \
\
Fifth, you need to do a pair of tree searches on each simulated data set. One search will be unconstrained and the other search will apply the a priori hypothesis as a constraint. This will give you 100+ pairs of tree scores. Find the difference, D, in tree scores for each pair. Compare your observed D value to this distribution of D values to find out whether you can significantly reject your H0 (a priori hypothesis) or not. \
\
Note, HOW you compare H1 vs. H0 will depend on the size of your data set (and computer power available to you):\
For tiny data sets: you will have the luxury of running full ML searches, without fixing parameter values. \
For modest data sets: you will have to fixed parameter values but you can still run ML on each simulated data set.\
For larger data sets: you will have to obtain the Tree Length (MP score) for you ML trees (H1 and H0), and use difference in tree length (TL) as your test statistic, and then evaluate all your simulated data under the parsimony criterion. \
\
GOAL:\
\
Your goal is basically to prepare/have the following documents/information:\
1. original data file, nexus format\
2. Best ML model of evolution and estimates of parameter values (do full ML search, if possible)\
3. Best ML tree. This topology = "H1"\
4. Constraints (easily created in MacClade)\
5. Parameter values estimated from constrained-ML tree. \
Your best option is a full likelihood search estimating all parameter values. If that's too slow, \
you can assume modeltest values first, then put the resulting constrained-ML tree back in modeltest\
and rerun, then use model-averaged values provided by modeltest v. 3.6 and later. \
6. ML tree found WITH constraint (a priori hypothesis). This topology = "H0"\
7. seq-gen commands. I save these as txt files, then in unix: "chmod 755 filename" \
Then you can execute the complicated seq-gen cmd each time by typing: "./filename"\
And you can edit this txt file for your future simulations. \
8. Text file with PAUP commands that seq-gen will insert in between simulated data sets. \
Two sets of commands are needed: one set to find best tree and one to find best-constrained tree.\
9. A bit of text with PAUP settings you will add to the top of your nexus file of simulated data.\
\
\
\
DETAILED PROCEDURE:\
\
\f1\b I. Get best trees without and with constraints.
\f0\b0 \
\
A. Make your constraint trees.\
\
I recommend doing this step early on, because you may realize that you wanted to delete some taxon or other, and it's best you do so now rather than later. Also, I recommend you work out all your possible scenarios/hypotheses before running anything. \
\
1. Load your nexus file into MacClade. Go to Windows menu, select "Tree Window". Move branches around until you have your desired tree topology. This might be a mostly-bush with one or more groups of samples constrained to be monophyletic clades. Go to Trees menu and select "Store Tree..." and give it a name. For multiple a priori hypotheses, make new contraints and store them. Then save the file AND under Trees menu also select "Save Tree File as..." just in case. \
\
\f1\b II. Get best trees without and with constraints.
\f0\b0 \
\
Note, even if you want to evaluate your test statistic using parsimony (e.g., because it's computationally efficient and you have too many OTUs.\
\
A. Get best ML model for your data set. \
\
1. Trim your data set as necessary. If necessary take out the taxa you do not want to consider in your test. Perhaps your outgroups are not relevant. Also, fewer taxa will save you lots of time as you will need to do 200+ searches on your simulated data, later. So, anyway, now you will have re-do Modeltest. \
\
a. I recommend first doing a ML search under your fixed parameter values, then changing the top lines of the modelblock to something like (assuming that your tree is in the same folder/directory as your modelblock and data files):\
\
\f2\fs22 BEGIN PAUP;\
log file=taxon_modtest_2nd_log.txt replace;\
set criterion=likelihood ;\
GetTrees file=ML_fixedparam.tre ; [ add notes-to-self in brackets ]\
End;
\f0\fs28 \
\
and change the name of the "scorefile" using find-replace command in PAUP's text editor. Run modeltest on this new scorefile and then re-run ML search using new parameter values and new model of the model changed. Use AIC results. Do not use hLRT result. Better still, you could use this model for a full search and not have to use fixed parameter values, but that depends on your number of taxa, etc. \
\
b. Save the log and .tre files. The -Ln score of this tree you will use to compare against the -Ln of your constrained result/s (one or more H0's)\
\
2. Find best tree under constraint hypothesis. \
\
a. IF you're data set is small enough to allow you time to run a full-ML search, then a constraint can be imposed in PAUP as follows:\
\
\f2\fs22 BEGIN PAUP;\
CONSTRAINTS vicar_Tcarib = \
(((27,14,6,20,15,17,13,29,1,25,18,28,24,19),(12,16)),2,3,26,21,9,10,22,11,8,7,23,5,4); \
set criterion=likelihood ;\
Lset BaseFreq=estimate Nst=6 Rmat=estimate \
Rates=gamma Shape=estimate Pinvar=estimate ;\
HSEARCH Start=NJ swap=TBR [SPR,NNI,NONE] enforce=YES Constraints=vicar_Tcarib ; \
savetrees file=vicar_Tcarib_ML.tre brlens=yes ;\
END;
\f0\fs28 \
\
i. You will also need model and parameter values for your model estimated from this constrained analysis. Thus, your constrained hypothesis could need a DIFFERENT model! As mentioned above, your best option is a full likelihood search estimating all parameter values under the recommended model (probably the same as the model assumed for the unconstrained search, but it need not be!). If that's too slow, you can assume modeltest values first, then put the resulting constrained-ML tree back in modeltest (see 1a. above) and rerun, then see what model you get and also use model-averaged values provided by modeltest v. 3.6 and later as your final constrained-ml parameter values. \
\
Question: If modeltest on the constrained tree recommends a different model, do you assume that model or keep the old one? If you DO assume the new model, I imagine you must re-calculate the -Ln of your unconstrained tree under this new/additional model -- otherwise your test statistic (Diff(-Ln)) will be comparing different models, which can't possibly be valid. Until I can confirm which is better, I'll have to recommend that you try comparing optimal and unconstrained trees under the model rec'd by modeltest when run on the constrained tree. (Just don't compare -Ln's obtained with different models).\
\
\f1\b NOTE:
\f0\b0 All simulations need to be done on the constrained tree and assuming the parameter values obtained from the constrained tree search. As I mentioned, I also think that if the constrained modeltest comes up with a new model then you have to use that (possibly you'll have to repeat the unconstrained (H1) ML search. \
\
b. IF your data set is so large that you need to run ML searches with fixed parameter values, I recommend this iterative protocol:\
i. Find your ML tree with constraint by using the parameter values obtained by your modeltest run above.\
ii. Re-run modeltest using this constrained topology (see above for how to change your modeltest start tree). Be sure to re-name your modelblock file and in it change the name of the logfile.txt and change the name of the scorefile. \
iii. Run a new ML search with these new parameter values, with your topological constraint of interest. I recommend using the "model averaged values at the bottom of the modeltest output. \
\
The -Ln of this tree, minus the -Ln score of your unconstrained ML tree is your Test Statistic. Make sure these two assumed the same model model of evolution (though they probably used different values for the parameters).\
\
\
\f1\b III. Preparing files for Seq-Gen. Now you are gonna get the distribution of that Test Statistic.
\f0\b0 \
\
You will created simulated data sets (perhaps 100 to 1000?) under the model and parameters selected for your CONstrained tree. \
\
Here's what a command line for Seq-gen might look like. Prepare this info in a text file.\
\
\f2\fs24 seq-gen -mrev -l1434 -n500 -a0.3543 -g4 -i0.12345 -f0.3125,0.2835,0.1136,0.2904 -r 1.4002,12.7675,1.5995,0.2579,8.5780,1.0 -on -xPacDisp_PAUPcmds.txt < Pac_disp_ml_1_0poly.tre > PacDisp_simdata.nex
\f0\fs28 \
\
-m = a model of sequence evolution. rev = GTR or subsets thereof. HKY is another option. See seq-gen documentation for more. \
-l = number of base pairs to simulate. This should be the actual number analyzed (i.e., after excluded gaps, etc.) in your ML analyses. \
-n = number of simulated data sets you want to prepare. \
-a = your alpha shape parameter for the gamma (G) distribution of rate hetergeneity, e.g., the "+G" in many ML models. \
-g = number of discrete rate categories describing the G distribution. Default in PAUP is 4, and few people ever change this. \
-i = proportion of invariant sites, i.e., the "+I" in ML models.\
-f = frequencies of bases. PAUP and seq-gen read base frequencies in the SAME order, thankfully. \
For nucleotide frequencies, Seq-Gen interprets them in this order:
\f3 -f A, C, G and T\
In PAUP, base frequencies are written in this order: BASEFREQ = (frqA frqC frqG)
\f0 \
-r = rate matrix. Thankfully these appear in the same order as in PAUP. (see below.)\
In PAUP rates for nst=6 are written as follows: RMATRIX = (rAC rAG rAT rCG rCT)\
In Seq-gen you write the rates in this order: -r A to C, A to G, A to T, C to G, C to T and G to T, respectively\
-o = output format. n = nexus format.\
-x = name of file containing text you want inserted in between each simul. data set. \
< indicates 'infile' as in unix. In this case it means the H0 (constrained) tree file (see below).\
> indicates 'outfile' as in unix. In this case it means the name you want to give to your simulated data set. \
\
If you do not want a particular parameter, just delete it (instead of writing 0.0000).\
\
1. Modify the file containing your constrained tree (found under different parameters (and maybe model) than found/used for H1 tree). \
\
a. If you ended up with two or more ML trees, you might pick one arbitrarily. Or, pick the one with fewer polytomies (notice in TreeEdit that it indicates how many possible trees each ML tree might represent - pick the one with the smaller number). To delete all but one tree, open the .tre file in a text editor (e.g., TextEdit or PAUP), and simply delete the other trees indicated at the bottom of the file in Newick format. \
\
b. Open your constrained.tre file (aka H0) in TreeEdit. Under the Trees menu select "Resolve Tree." Select "Resolve using zero branch lengths". Then save under a new name, e.g., ending with "_0poly.tre" if you want.\
\
c. Open this _0poly.tre file in a text editor such as TextWrangler or TextEdit (or whatever). You will see lots of XML notation that you will delete completely, leaving only your tree Newick (parenthetical) format, followed by a semi-colon. \
\
2. Make your seq-gen command file in text editor (e.g., TextEdit) which again will look like this:\
\
\f2\fs24 seq-gen -mrev -l1434 -n500 -a0.3543 -g4 -i0.12345 -f0.3125,0.2835,0.1136,0.2904 -r 1.4002,12.7675,1.5995,0.2579,8.5780,1.0 -on -xPacDisp_PAUPcmds.txt < Pac_disp_ml_1_0poly.tre > PacDisp_simdata.nex
\f0\fs28 \
\
3. Paste in the name of your contrained_ml tree file (H0) you just created in 1., above, e.g.:
\f2\fs24 Pac_disp_ml_1_0poly.tre
\f0\fs28 \
4. pick a name for your simulated data set, e.g.:
\f2\fs24 PacDisp_simdata.nex
\f0\fs28 \
5. Enter the length of DNA sequence (actual number of bases analyzed in your real tree (e.g., after removing gaps, etc.))\
6. Enter number of replicate data sets you want. \
example 1: if MP tree searches take 8-16 minutes each, try 100 replicates.\
example 2: if fixed-parameter ML analyses take 30-90 seconds, try 500 reps (this analysis would take ~15 hours(?)).\
7. Find your CONstrained tree ML results (full-ML or iterative-fixed-parameter-ML) and enter the appropriate ML model parameter values. (Note, there is no white space after the "-r". See above for order of base freqs. and rate matrix.)\
8. Pick a name for your the file that you are about to make containing your PAUP search commands, e.g.:
\f2\fs24 PacDisp_PAUPcmds.txt
\f0\fs28 \
\
9. Making the file containing your PAUP search commands.\
\
This file contains text that seq-gen will insert between every block of simulated data. In the above seq-gen command line example, this file is called " "
\f2\fs24 PacDisp_PAUPcmds.txt
\f0\fs28 ". In this text file there are 2 sets of commands: the first finds the optimal tree, the second finds the optimal tree under your topological constraint. Here is an example for fixed parameters under a GTR+I model: \
\
\f2\fs22 begin PAUP ; \
[! * * * Calculating ML scores, UNconstrained * * *]\
ClearTrees NoWarn=yes ;\
Set criterion=likelihood ;\
Lset BaseFreq=(0.3101 0.2886 0.1094) Nst=6 Rmat=(1.5284 20.0692 1.8064 0.2256 10.1996) \
Rates=equal [Rates=gamma Shape=0.3456] Pinvar=0.6477 ;\
Hsearch start=NJ swap=TBR multrees=no status=no ;\
LScores /ScoreFile=Ln_vicar_Tcarib_best.txt APPEND=Yes BaseFreq=(0.3101 0.2886 0.1094) \
Nst=6 Rmat=(1.5284 20.0692 1.8064 0.2256 10.1996) [Rates=gamma Shape=0.3456] Pinvar=0.6477 ; \
SaveTrees file=vic_Tcar_sim_best.tre APPEND=yes BrLens=YES ;\
END;\
\
begin PAUP ;\
[! \
* * * Calculating ML scores, CONstrained * * *]\
ClearTrees NoWarn=yes ;\
CONSTRAINTS vicar_Tcarib = (((27,14,6,20,15,17,13,29,1,25,18,28,24,19),(12,16)),2,3,26,21,9,10,22,11,8,7,23,5,4); \
[ 'T-SR and BO-SR-T' ]\
Lset BaseFreq=(0.3101 0.2886 0.1094) Nst=6 Rmat=(1.5284 20.0692 1.8064 0.2256 10.1996)\
Rates=equal [Rates=gamma Shape=0.3456] Pinvar=0.6477 ;\
Hsearch start=NJ swap=TBR multrees=no status=no ENFORCE=YES Constraints=vicar_Tcarib ;\
LScores /ScoreFile=Ln_vicar_Tcarib_CON.txt APPEND=Yes BaseFreq=(0.3101 0.2886 0.1094) \
Nst=6 Rmat=(1.5284 20.0692 1.8064 0.2256 10.1996) Rates=equal [Shape=0.] Pinvar=0.6477 ;\
SaveTrees file=vic_Tcar_sim_CON.tre APPEND=yes BrLens=YES ;\
END;
\f0\fs28 \
\
Ideally, you'd do a full ML search estimating optimal parameter values for each data set, but that's only possible for the smallest of data sets (or fastest of computers). If using fixed ML values, be sure they are the best possible values obtained for the constrained (H0) tree (same as in the seq-gen command file). \
\
Here is an example for analyzing the simulated data sets under parsimony:\
\
\f2\fs22 begin PAUP;\
[! * * * Calculating MP scores, UNconstrained * * *]\
ClearTrees NoWarn=yes ;\
Hsearch start=stepwise addseq=random nreps=10000 savereps=yes randomize=trees \
rstatus=no hold=1 swap=TBR multrees=yes status=no ;\
Filter best=yes permdel=yes ;\
PScores /ScoreFile=TLs_tala_long_mono_best.txt APPEND=Yes ;\
SaveTrees file=tala_long_mono_sim_best.tre APPEND=yes BrLens=YES ;\
END;\
\
begin PAUP ;\
[! \
* * * Calculating MP scores, CONstrained * * *]\
ClearTrees NoWarn=yes ;\
Constraints tala_long_mono = (11,39,40,12,38,20,31,30,21,19,13,18,17,33,34,32,23,15,29,28,14,16,22,26,25,24,36,37,27,7,3,4,10,5,2,6,1,9,8,(41,42,35),43) ;\
Hsearch start=stepwise addseq=random nreps=10000 savereps=yes randomize=addseq \
rstatus=no hold=1 swap=TBR multrees=yes ENFORCE=YES Constraints=tala_long_mono ;\
Filter best=yes permdel=yes ;\
PScores /ScoreFile=TLs_tala_long_mono_cnstrnd.txt APPEND=Yes ;\
SaveTrees file=tala-long_mono_sim_cnstrnd.tre APPEND=yes BrLens=YES ;\
END;
\f0\fs28 \
\
For each analysis, you'll need to pick filenames for two tree files (best and constrained), pick filenames for two tree score files (best and constrained), update your ML parameter values (if assuming ML not MP), plug in a new constraint, and update its name and its name in the Hsearch command. \
\
You want to strive for the best possible MP search, given the time and computer power available. \
Normally, tree searches under constrained topologies run faster than unconstrained searches. \
\
10. Prepare a little text file for some PAUP commands that you will put at the top of your simulated data sets (which are going to be in PAUP format because your seqgen cmd contains the flag "
\f2\fs24 -on
\f0\fs28 "). You will need this file at the end of the next step. \
\
\f2\fs22 begin paup;\
log file=vicar_Tcar_PAUPsim.out.txt Replace=Yes ; [don't forget to add "log stop" at bottom.]\
Set criterion=likelihood ;\
set autoclose=yes ;\
set MaxTrees=1000 increase=auto AutoInc=100 ;\
Set TOrder=Right ;\
END;\
[You should also make the outfiles of the first command block say REPLACE where the rest say "APPEND"]
\f0\fs28 \
\
Save this file as something like "
\f4\fs24 add_top_simul_data
\f0\fs28 "\
\
\f1\b \
IV. Generating simulated data with seq-gen and modifying the resulting nexus file.
\f0\b0 \
\
1. Find the file containing the text you made that looked vaguely like this:\
\
\f2\fs24 seq-gen -mrev -l1434 -n500 -a0.3543 -g4 -i0.12345 -f0.3125,0.2835,0.1136,0.2904 \
-r1.4002,12.7675,1.5995,0.2579,8.5780,1.0 -on -xPacDisp_PAUPcmds.txt \
< Pac_disp_ml_1_0poly.tre > PacDisp_simdata.nex
\f0\fs28 \
\
Perhaps name it something like "Pac_disp_hyp.seqgen"\
\
2. Do yourself a huge favor and confirm you have ALL your files updated with the right parameter values and infile and outfile names, etc. Check all the files invoked by your ".seqgen" cmd file. Go back and look at your original PAUP/modeltest results and make sure all the parameter values are right. Etc. etc. \
\
3. At least the first time you make one of these files, you will have to open it in Unix to change it's read-write-execute privileges by invoking the change mode command, "chmod". \
\
a. At the Unix prompt type: "chmod 755 Pac_disp_hyp.seqgen" \
b. type the command "ls -Fla" to confirm that the left column looks like "-rwxr-xr-x" instead of "-rwx------". The x's mean the file is now executable. \
c. Now, as mentioned way above, you can execute the complicated seq-gen cmd each time by typing: "./Pac_disp_hyp.seqgen" and you can edit this text file for your future simulations. \
\
4. Be sure that the seq-gen is somewhere on your path (either in your working directory or, e.g., in (or 'linked' to) your /bin/ file. \
\
5. Execute seq-gen by typing "./Pac_disp_hyp.seqgen" or whatever your new seq-gen command file is called. Total run time is a few seconds. If it thinks you want to use 100's of trees, you might have forgotten to edit your .tre file (see above). \
\
6. Edit your new PAUP file! See instructions on your file that you might have named "
\f4\fs24 add_top_simul_data
\f0\fs28 " (see above). \
\
a. paste these instructions into the top of your PAUP file (below the first line containing "#NEXUS").\
b. change "APPEND" to "REPLACE" after the first data block (only). \
c. add "LOG STOP" to the final PAUP block at the very very bottom of the file. \
\
\f1\b V. Analyze these simulated data on PAUP
\f0\b0 \
\
Use the fastest machine available to you. If you're clever, you could even split the job into multiple blocks and run the analyses on different machines. But perhaps not the first time, when you haven't yet gone through the final steps. More likely you are testing multiple hypotheses in which case you can run each analysis in parallel. With practice, each analysis still takes me 30-60 minutes to organize! \
\
\f1\b VI. Analyzing your output
\f0\b0 \
\
A. Converting your two lscores output files into one excel file. The following instructions use the unix text editor "emacs." Your goal here is to end up with one file that has two columns of numbers, one column containing -Ln scores of constrained trees and the other column of unconstrained tree scores. In other words, each row contains one constrained and one unconstrained -Ln score as calculated from the
\f5\i same
\f0\i0 simulated data set. This you will open in Excel (if you want) and then calculate the difference in scores in each row, and then plot the distribution of these differences (this plotting is a pain in Excel, but check the Help. \
\
1. Copy with new names your two tree score output files from your PAUP analysis of the simulated data sets. This way, if you mess something up you can go back to the originals.\
\
2. Open one of the files in emacs by typing at Unix prompt: "emacs filename_cnstrnd.txt" \
\
\
In UNIX environment:\
\pard\tx565\tx1133\tx1700\tx2266\tx2832\tx3401\tx3967\tx4535\tx5102\tx5669\tx6235\tx6802\li540\fi-540\ql\qnatural
\cf0 cp filename to a new filename you can play with (in case you mess up something and need to go back to the original).\
Open in emacs.\
With cursor at the top left, type the following two regexp commands, the first one should get rid of every line that starts with a number besides '1'. The second command should remove every line that starts with a number 1 but containing more than one digit.\
(1) M-x replace-regexp (that mean "hit return key"). Then type: ^[^1][0-9]*[0-9]*cntrl-q-j (don't type any white spaces, and hold down control key will hitting 'q' then also while hitting 'j' next). When it asks "replace... with:" just hit again. \
Now move the cursor back to the top of the file with the command "M-x <"\
(2) M-x replace-regexp Then type: 1[0-9]+[0-9]*cntrl-q-j \
Move the cursor back to the top and now just to double-check, try counting how many lines you have:\
(3) M-x how-many Then type: "Tree" The resulting number should be the number of simulated datasets you analyzed. \
Now get rid of that column containing just "Tree" and "1", by marking one corner of the column and then going to the opposite corner and killing it. If you are at the top-left, mark it with cntrl- (Hit space bar while holding down the control key). Move to the bottom of the file with "M-x >". Position the cursor at the opposite corner of the column you want to kill. Now type: "c-x r k"\
Now delete the word "Length" in all but the top line, using a "M-x replace-string command" replacing "Length" followed by cntrl-q-j then . \
If you are working with your contrained tree scores, then you want to put best tree scores after this, so you want to add TABs to one side of every element in your column. Like this: M-x replace-regexp Then: [0-9a-zA-Z]* Then: \\& . ("\\&" = a wildcard for "that thing just found.")\
Also, be sure to label your column heading as 'best' (H1) or 'cnstrnd' (H0) or some such convention. \
NOW, do the above treatment on your other file of tree scores. At the bottom of one of these files, add some extra lines and type "c-x i" to insert the other file into the bottom of this file. Rename it ('save as') with "c-x c-w". \
Now insert one column next to the other. Kill one rectangle, as above ("c-x r k"), then move to the top of where you want to slip in the new column you just 'killed' then 'yank' it back with "c-x r y".\
And hopefully you are done! Though you might wipe up any blank lines/spaces at the bottom by marking the bottom of your data with "c-SPC", then moving to the bottom of your document with "M-x >" then whacking the intervening area by hitting "c-w". Save and close (c-x c-s, c-x c-c). \
\pard\tx565\tx1133\tx1700\tx2266\tx2832\tx3401\tx3967\tx4535\tx5102\tx5669\tx6235\tx6802\ql\qnatural
\cf0 \
Now you are ready to open this file in Excel (if you want) and calculate the difference in scores in each row, and then plot the distribution of these differences. This plotting is a pain in Excel, but check the Help. In my version, I have to envoke some command by typing the unusual pair of keystrokes: cntrl-u then cmd- \
\
\
}