Colour coding:
# comments
> user commands
computer output
# First simulate two presence-absence maps on a 40x40 grid of squares.
# These maps will be compared for similarity.
# Each map is generated so that the probability of presence
# in every site is p.pres=0.2.
> mymaps <- makemap.func(40, 40, p.pres=0.2)
# View the maps that have been generated.
> mymaps
x | y | site | map1 | map2 |
1 | 1 | 1 | 0 | 1 |
2 | 1 | 2 | 0 | 0 |
3 | 1 | 3 | 0 | 0 |
4 | 1 | 4 | 1 | 1 |
5 | 1 | 5 | 0 | 0 |
6 | 1 | 6 | 0 | 1 |
7 | 1 | 7 | 0 | 0 |
8 | 1 | 8 | 0 | 0 |
9 | 1 | 9 | 0 | 0 |
: | : | : | : | : |
# Interpretation:
# x: the x-coordinate of the site. Ranges from 1 to 40 in the 40x40 grid,
# y: the y-coordinate of the site. Ranges from 1 to 40 in the 40x40 grid,
# site: site number given to each site. Ranges from 1 to 1600 in the example,
# map1: record of presence (1) or absence (0) in each site in the first map.
# but any real coordinates are acceptable.
# but any real coordinates are acceptable.
# but any set of unique numeric labels is acceptable.
# map2: record of presence (1) or absence (0) in each site in the second map.
# Plot images of the maps that have been generated:
> par(mfrow=c(2,2))
# (allow 2 rows and 2 columns per graph sheet)
> image(matrix(mymaps$map1, nrow=40), main="Map 1")
> image(matrix(mymaps$map2, nrow=40), main="Map 2")
# Find the best attainable match of map1 and map2, choosing the clique
# of any site (set of sites with which it can be "swapped") to be its eight
# nearest neighbours: horizontal, vertical, and diagonal.
# This means the clique contains all sites within a radius of the square root of 2
# of the central site.
# To avoid rounding errors, it may be advisable to add a small number to sqrt(2)
# and use radius = sqrt(2)+0.01 (say).
# The command is:
> bam.func(mymaps, radius=sqrt(2)+0.01)
# (note that map 1 and map 2 are randomly generated, so your results will
# be different.)
Singletons sweep 1 completed. Swap = 155
Exact connected component: swap = 2
Exact connected component: swap = 2
Exact connected component: swap = 2
SUMMARY
Swap attributable to singletons: 155
Swap attributable to exact connected components: 6
Number of 2-blocks: 0
Number of 3-blocks: 0
Number of 4-blocks: 0
Number of 5-blocks: 0
Total number of blocks: 0
Number of sites: 1600
Original match: 1075
Best possible swap: 161
Best attainable match: 1397
[1] 1397
# Interpretation:
# Most of the output can be ignored: details about singletons (1-blocks), exact
# The last four lines give the important output:
# The numbers are related via best attainable match = original match + 2 x (best possible swap).
# connected components, and other blocks just show how the algorithm is running.
# If the output freezes for several seconds and then reports a 12-block (say), it just indicates
# that the bipartite graph around which the algorithm is constructed is fairly complex.
# there are 1600 sites in each of the two maps (due to the 40x40 grid structure);
# the original match (1075) is the simple matching coefficient: the number of
# sites that had the same status (0-0 or 1-1) in maps 1 and 2;
# the best possible swap (161) is the total number of within-clique swaps possible;
# the best attainable match (1397) is the number of matched sites, after all
# within-clique swaps have been made.
# Instead of using the "radius" option, the clique structure can be entered directly.
# Alternatively, if several different maps are to be matched on the same set of
# sites and with the same radius, it will save a small amount of time to set up
# the clique structure once at the beginning and use it thereafter. Either way, the
# BAM algorithm should be invoked with the cliquesdat argument instead
# of the radius argument.
# The following example shows how to use the cliquesdat argument instead
# of the radius argument. The two arguments can not be used together.
# First create sets of maps on a 40x40 square grid:
> mymaps1 <- makemap.func(40, 40)
# Next create the clique structure for a 40x40 square grid with radius=1.5.
> mymaps2 <- makemap.func(40, 40)
# (It doesn't matter which map set is used for this step, as long as all map sets are
# on the same grid.)
# Store the clique structure in dataset mycliques.40x40.1.5.
> mycliques.40x40.1.5 <- makeclique.func(mymaps1, 1.5)
# View the first few lines of the cliques structure:
> mycliques.40x40.1.5[1:10,]
site | clique | |
1 | 1 | 2 |
2 | 1 | 41 |
3 | 1 | 42 |
4 | 2 | 1 |
5 | 2 | 3 |
6 | 2 | 41 |
7 | 2 | 42 |
8 | 2 | 43 |
9 | 3 | 2 |
10 | 3 | 4 |
: | : | : |
# Interpretation:
# The site column lists the sites from 1 to 1600.
# Now find the BAM for the maps in mymaps1:
# The clique column lists one by one the sites that belong to the clique of each site.
# Thus site 1 has in its clique sites 2, 41, and 42;
# site 2 has in its clique sites 1, 3, 41, 42, and 43;
# etc.
# The cliques must be entered in this format, one line for every pair of within-clique sites.
> bam.func(mymaps1, cliquesdat=mycliques.40x40.1.5)
[Answer: 1389; yours will be different]
# Find the BAM for the maps in mymaps2:
> bam.func(mymaps2, cliquesdat=mycliques.40x40.1.5)
[Answer: 1373; yours will be different]
# This has exactly the same effect as using the radius option directly:
> bam.func(mymaps1, radius=1.5)
[Answer: 1389 as above]
> bam.func(mymaps2, radius=1.5)
[Answer: 1373 as above]
# The difference is just that the "cliquesdat" option stores the clique structure
# permanently in object mycliques.40x40.1.5, and a little less calculation
# is required for bam.func in the first method than in the second method.
# For real problems, the map data will usually have to be read into R from
# an external source. The clique structure might be determined by the radius
# option, or read in manually along with the map data from an external file.
# For manually entering data, it is important to get the format of the data files exactly correct.
# The map data must have columns headed x, y, site, map1, and map2, although
# the ordering of the columns is not important. The x and y coordinates can be
# any real numbers; site labels should be numbers; and map1 and map2 should
# consist of 0s and 1s only. Any further columns of data will be ignored by the BAM functions.
# The data should be placed in a plain text file, space or tab delimited.
# The required format for the first few lines of the map data is below:
x | y | site | map1 | map2 |
31.1 | 16.2 | 1 | 0 | 1 |
23.6 | 19.8 | 2 | 0 | 0 |
36.8 | 20.2 | 3 | 0 | 0 |
41.2 | 16.5 | 4 | 1 | 1 |
# This map data should be placed into a folder visible to R, with a name like mapdat.
# It is read in to R with a command like:
> newmaps <- read.table("mapdat", header=T)
# The BAM algorithm can then be run as before, for example with a radius of 6
# (in whatever units the x and y coordinates are measured in):
> bam.func(newmaps, radius=6)
# If the clique structure is also to be read in manually, the data file can again
# The arrangement of the clique data is as follows. Suppose that site 1 has clique (2, 3, 4);
# be plain text and tab or space delimited. It must have columns headed
# site and clique. Column site refers to the site number of
# the site in question (as in the site column of the map data).
# Column clique lists, one row at a time, all the site numbers of sites in the
# same clique as the site in question.
# site 2 has clique (1, 10); site 3 has clique (1, 4, 21); and so on.
# The data should be entered as follows into the text file:
site | clique |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 1 |
2 | 10 |
3 | 1 |
3 | 4 |
3 | 21 |
: | : |
# Each pair of sites is given a separate row, and repetitions are expected (e.g. sites 1 and 2 are
# listed together both under site 1 and site 2).
# The clique data should be stored in a file named something like cliquesfile, in a folder
# visible to R. After reading the clique data into R, the BAM functions are run with
# argument cliquesdat replacing the radius option:
> newcliques <- read.table("cliquesfile", header=T)
> bam.func(newmaps, cliquesdat=newcliques)