BAM demonstration

Demonstration of R code for the Best Attainable Match

Colour coding:
      # comments
      > user commands
      computer output

# First simulate two presence-absence maps on a 40x40 grid of squares.
# These maps will be compared for similarity.
# Each map is generated so that the probability of presence
# in every site is p.pres=0.2.

> mymaps <- makemap.func(40, 40, p.pres=0.2)

# View the maps that have been generated.

> mymaps

x y site map1 map2
1 1 1 0 1
2 1 2 0 0
3 1 3 0 0
4 1 4 1 1
5 1 5 0 0
6 1 6 0 1
7 1 7 0 0
8 1 8 0 0
9 1 9 0 0
: : : : :

# Interpretation:

# x: the x-coordinate of the site. Ranges from 1 to 40 in the 40x40 grid,
# but any real coordinates are acceptable.

# y: the y-coordinate of the site. Ranges from 1 to 40 in the 40x40 grid,
# but any real coordinates are acceptable.

# site: site number given to each site. Ranges from 1 to 1600 in the example,
# but any set of unique numeric labels is acceptable.

# map1: record of presence (1) or absence (0) in each site in the first map.
# map2: record of presence (1) or absence (0) in each site in the second map.

# Plot images of the maps that have been generated:

> par(mfrow=c(2,2)) # (allow 2 rows and 2 columns per graph sheet)
> image(matrix(mymaps$map1, nrow=40), main="Map 1")
> image(matrix(mymaps$map2, nrow=40), main="Map 2")

# Find the best attainable match of map1 and map2, choosing the clique
# of any site (set of sites with which it can be "swapped") to be its eight
# nearest neighbours: horizontal, vertical, and diagonal.
# This means the clique contains all sites within a radius of the square root of 2
# of the central site.
# To avoid rounding errors, it may be advisable to add a small number to sqrt(2)
# and use radius = sqrt(2)+0.01 (say).
# The command is:

> bam.func(mymaps, radius=sqrt(2)+0.01)

# (note that map 1 and map 2 are randomly generated, so your results will
# be different.)

Singletons sweep 1 completed. Swap = 155
Exact connected component: swap = 2
Exact connected component: swap = 2
Exact connected component: swap = 2

SUMMARY
Swap attributable to singletons: 155
Swap attributable to exact connected components: 6
Number of 2-blocks: 0
Number of 3-blocks: 0
Number of 4-blocks: 0
Number of 5-blocks: 0
Total number of blocks: 0

Number of sites: 1600
Original match: 1075
Best possible swap: 161
Best attainable match: 1397

[1] 1397

# Interpretation:

# Most of the output can be ignored: details about singletons (1-blocks), exact
# connected components, and other blocks just show how the algorithm is running.
# If the output freezes for several seconds and then reports a 12-block (say), it just indicates
# that the bipartite graph around which the algorithm is constructed is fairly complex.

# The last four lines give the important output:
# there are 1600 sites in each of the two maps (due to the 40x40 grid structure);
# the original match (1075) is the simple matching coefficient: the number of
# sites that had the same status (0-0 or 1-1) in maps 1 and 2;
# the best possible swap (161) is the total number of within-clique swaps possible;
# the best attainable match (1397) is the number of matched sites, after all
# within-clique swaps have been made.

# The numbers are related via best attainable match = original match + 2 x (best possible swap).

Using the "cliquesdat " option instead of "radius"

# Instead of using the "radius" option, the clique structure can be entered directly.
# Alternatively, if several different maps are to be matched on the same set of
# sites and with the same radius, it will save a small amount of time to set up
# the clique structure once at the beginning and use it thereafter. Either way, the
# BAM algorithm should be invoked with the cliquesdat argument instead
# of the radius argument.

# The following example shows how to use the cliquesdat argument instead
# of the radius argument. The two arguments can not be used together.

# First create sets of maps on a 40x40 square grid:

> mymaps1 <- makemap.func(40, 40)
> mymaps2 <- makemap.func(40, 40)

# Next create the clique structure for a 40x40 square grid with radius=1.5.
# (It doesn't matter which map set is used for this step, as long as all map sets are
# on the same grid.)
# Store the clique structure in dataset mycliques.40x40.1.5.

> mycliques.40x40.1.5 <- makeclique.func(mymaps1, 1.5)

# View the first few lines of the cliques structure:

> mycliques.40x40.1.5[1:10,]

site clique
1 1 2
2 1 41
3 1 42
4 2 1
5 2 3
6 2 41
7 2 42
8 2 43
9 3 2
10 3 4
: : :

# Interpretation:

# The site column lists the sites from 1 to 1600.
# The clique column lists one by one the sites that belong to the clique of each site.
# Thus site 1 has in its clique sites 2, 41, and 42;
# site 2 has in its clique sites 1, 3, 41, 42, and 43;
# etc.
# The cliques must be entered in this format, one line for every pair of within-clique sites.

# Now find the BAM for the maps in mymaps1:

> bam.func(mymaps1, cliquesdat=mycliques.40x40.1.5)

[Answer: 1389; yours will be different]

# Find the BAM for the maps in mymaps2:

> bam.func(mymaps2, cliquesdat=mycliques.40x40.1.5)

[Answer: 1373; yours will be different]

# This has exactly the same effect as using the radius option directly:

> bam.func(mymaps1, radius=1.5)

[Answer: 1389 as above]

> bam.func(mymaps2, radius=1.5)

[Answer: 1373 as above]

# The difference is just that the "cliquesdat" option stores the clique structure
# permanently in object mycliques.40x40.1.5, and a little less calculation
# is required for bam.func in the first method than in the second method.

Manually entering the map data and/or the clique structure

# For real problems, the map data will usually have to be read into R from
# an external source. The clique structure might be determined by the radius
# option, or read in manually along with the map data from an external file.

# For manually entering data, it is important to get the format of the data files exactly correct.
# The map data must have columns headed x, y, site, map1, and map2, although
# the ordering of the columns is not important. The x and y coordinates can be
# any real numbers; site labels should be numbers; and map1 and map2 should
# consist of 0s and 1s only. Any further columns of data will be ignored by the BAM functions.
# The data should be placed in a plain text file, space or tab delimited.

# The required format for the first few lines of the map data is below:

x       y       site map1 map2
31.1    16.2    1 0 1
23.6    19.8    2 0 0
36.8    20.2    3 0 0
41.2    16.5    4 1 1

# This map data should be placed into a folder visible to R, with a name like mapdat.
# It is read in to R with a command like:

> newmaps <- read.table("mapdat", header=T)

# The BAM algorithm can then be run as before, for example with a radius of 6
# (in whatever units the x and y coordinates are measured in):

> bam.func(newmaps, radius=6)

# If the clique structure is also to be read in manually, the data file can again
# be plain text and tab or space delimited. It must have columns headed
# site and clique. Column site refers to the site number of
# the site in question (as in the site column of the map data).
# Column clique lists, one row at a time, all the site numbers of sites in the
# same clique as the site in question.

# The arrangement of the clique data is as follows. Suppose that site 1 has clique (2, 3, 4);
# site 2 has clique (1, 10); site 3 has clique (1, 4, 21); and so on.
# The data should be entered as follows into the text file:

site clique
1 2
1 3
1 4
2 1
2 10
3 1
3 4
3 21
: :

# Each pair of sites is given a separate row, and repetitions are expected (e.g. sites 1 and 2 are
# listed together both under site 1 and site 2).

# The clique data should be stored in a file named something like cliquesfile, in a folder
# visible to R. After reading the clique data into R, the BAM functions are run with
# argument cliquesdat replacing the radius option:

> newcliques <- read.table("cliquesfile", header=T)
> bam.func(newmaps, cliquesdat=newcliques)

Last updated: 17th April 2003

x	y	site	map1	map2
1	1	1	0	1
2	1	2	0	0
3	1	3	0	0
4	1	4	1	1
5	1	5	0	0
6	1	6	0	1
7	1	7	0	0
8	1	8	0	0
9	1	9	0	0
:	:	:	:	: