Physical Performance and Decision Making in Association Football Referees: a Naturalistic Study

Although researchers have independently investigated the physical and decision-making (DM) demands experienced by sports officials, the combined impact of locomotion and physiological factors upon DM has received little attention. Using an innovative combination of video and Global Positioning System (GPS) technology this study explored the movement, heart rate (HR) and DM of experienced football referees in their natural performance environment. A panel of independent referees analysed incidents (n = 144) taken from five referees in seven games in the New Zealand Football Championship (2005/06). The match-day referees made accurate decisions on 64% of the incidents, although their accuracy levels were not related to variables such as movement speed, HR, and cumulative distance covered. Interestingly , referees were on average only 51% accurate in the opening fifteen minutes of each half compared to 70% accuracy at all other times. This study demonstrated that it is possible to combine new emerging technologies to conduct a comprehensive study of naturalistic decision-making in sport. Sports officials are regularly placed in the media spotlight as their decisions can affect the outcome of games and competitions [1]. Unfortunately, there is little scientific support to assist the performance and development of these individuals. Although empirical research into sports officiating is growing [e.g. 2, 3], there is still a shortage of information about the different demands of football refereeing. The roots of existing decision-making (DM) research are entrenched in cognitive psychology and many studies have examined simulated tasks in carefully controlled, laboratory environments [4]. More recently sport scientists have turned to the strategy of making expert-novice performer comparisons to gain insights in realistic performance environments [5]. One rationale fuelling both lines of research is that it might be possible to teach people to look, think and act like expert decision-makers. As a consequence of such normative approaches the process of DM has traditionally been viewed as a formal, analytical behaviour, emphasising aspects such as the use of advance perceptual cues and pre-planning, the estimation of risk and the prediction of likely outcomes [6]. Klein [6] is critical of traditional normative DM models suggesting that they assume that the performer accesses a set of optimal procedures, which can be adopted only under ideal conditions. Such prescriptive procedures describe an assumed approach of " how to " make decisions, rather than exploring the actual processes that the experts adopt and are able to use very effectively under extreme pressure. For …

Sports officials are regularly placed in the media spotlight as their decisions can affect the outcome of games and competitions [1].Unfortunately, there is little scientific support to assist the performance and development of these individuals.Although empirical research into sports officiating is growing [e.g. 2, 3], there is still a shortage of information about the different demands of football refereeing.
The roots of existing decision-making (DM) research are entrenched in cognitive psychology and many studies have examined simulated tasks in carefully controlled, laboratory environments [4].More recently sport scientists have turned to the strategy of making expert-novice performer comparisons to gain insights in realistic performance environments [5].One rationale fuelling both lines of research is that it might be possible to teach people to look, think and act like expert decision-makers.As a consequence of such normative approaches the process of DM has traditionally been viewed as a formal, analytical behaviour, emphasising aspects such as the use of advance perceptual cues and pre-planning, the estimation of risk and the prediction of likely outcomes [6].
Klein [6] is critical of traditional normative DM models suggesting that they assume that the performer accesses a set of optimal procedures, which can be adopted only under ideal conditions.Such prescriptive procedures describe an assumed approach of "how to" make decisions, rather than exploring the actual processes that the experts adopt and are able to use very effectively under extreme pressure.For example, little research has considered the interacting and emergent effects of physical exercise, arousal, time pressure and anxiety on DM processes.In a quasi-naturalistic envi-ronment, Royal et al. [7] used a video-based, temporally occluded DM task to assess waterpolo players' decisions, revealing a small increase in DM performance when under very high fatigue in comparison to low fatigue.However, this study only involved comparatively simple responses of choice, so it is difficult to judge how this finding might relate to more complex decisions.The challenge for current research is to try to develop naturalistic methodologies to better understand the strategies that experts use to become effective decision-makers when exposed to realistic, timepressured environments such as those present in competitive sport [8].
What difficulties are associated with developing naturalistic methods in sport science research and how have we dealt with them in the current study?Firstly, participants need to be studied whilst immersed in an environment that is representative of their typical performance environment which makes experimental control difficult to achieve [9].Second, the researcher must allow the physical behaviour of the participants to emerge as they might during performance rather than to artificially restrict the performer to certain types of responses [10].Thirdly, a related, but equally important issue concerns the inter-relatedness of perception and action.In the past researchers have tended to study each aspect as independent entities in isolation of each other, however ecological psychologists warn us that the processes of perception and action are mutually co-dependent and therefore should not be examined separately [11].In the present study we sought to address these difficulties by examining together the physical performance and decision making of football referees who were officiating in actual competitive matches.This approach is naturalistic in that we have accepted the inherent variability (and lack of control) associated with the referees' typical performance environment as removing this feature in impoverished simulations is likely to alter their decision-making.Furthermore, the availability of modern, portable equipment such as GPS units and digital video allowed us to analyse the referees' behaviour without interfering with and/or separating the natural coupling that exists between perception and action.

Decision-Making and Football Refereeing
In a dynamic multi-actor environment, such as an association football match, there are clearly a number of factors impinging upon the DM processes of the referee.For example, effective decisions should take into account the laws of the game, any relevant prior events (e.g.number of fouls committed by a player), the situational context of the game at the time [12] and also factors such as the positioning of the referee and his/her interaction with assistant referees [13].Whilst one can be reasonably confident about what information football referees may be using, it is notoriously difficult to directly access information regarding the procedural knowledge used in such naturalistic environments, i.e., the processes performers use to arrive at a decision.Such processing may occur on both conscious and sub-conscious levels simultaneously and therefore are difficult to accurately recall by the performer in retrospective interview [14].
Programmes of research that have investigated accuracy of DM amongst high-performing rugby referees suggest that assessment and training for referees should use 'naturalistic' paradigms, i.e., where referees are studied performing in their real-world environment rather than in a laboratory [12].For example, quasi-naturalistic (e.g.video based) simulations of the complex tackle law in rugby union and league have identified that national panel referees make correct decisions on approximately 50% [2,15] and 65% [16] of occasions respectively.Using a similar approach in football, Fuller, Junge, and Dvorak [17] found UEFA referees' decisions to be 70% accurate during actual performance, as judged by an expert panel of Union of European Football Association (UEFA) referees.Furthermore, retrospective video analysis of refereeing performance at the 2002 Football World Cup in Japan and Korea by Gilis, Weston, Helsen, Junge, and Dvorak [18] showed that referees made the correct decision on 60% of player-player contact incidents.Given that referees are required to make on average of 44 observable decisions per game (arguably more when one includes those that are not observable) without any input from their assistants [19], such levels of accuracy could have a large impact on the game.
Recently research has begun to consider the factors that might affect the quality of referees' decisions.For example there is some compelling evidence that referee bias contributes to home advantage in football [20,21].It is also possible that exercise-induced fatigue can negatively impact DM.Borotikar, Newcomer, Koppes and McLean [22] suggest that as muscular fatigue accumulates during exercise a heavier emphasis placed upon centralised neural control may be paralleled by increased mental load and consequently less time spent 'on-task'.There has also been increasing attention directed towards Assistant Referees (ARs).Of interest to the current investigation is the suggestion that ARs' performance is impaired when their speed of movement increases to a run or sprint [11].In summary, research would suggest that it is important to consider the effect of factors such as re-sponse biases (e.g., home field advantage), fatigue, and movement speed on referees' DM performance [23].

The Physiological Demands of Football Refereeing
Over the last 20 years, at least eleven studies have directly examined the physiological demands of football refereeing at various levels of the game and in a range of countries (see Table 1).Referees have been shown to cover between 7.5 to 11.5km per game, with games played at a higher level (e.g.international matches) typically resulting in officials covering a greater distance [24,25].Despite covering more ground, the UEFA international referees have lower heart rates (HR) (mean = 155 bpm ± 16, approximately 85% HRmax [19]) when compared to lower ranked referees (mean = 165 bpm ± 8, approximately 95% HRmax [24]), suggesting as one would expect, greater fitness levels for the 'top' referees.There is a reasonable degree of consistency over the percentage of time spent moving at different speeds, although subtle variations exist depending upon how activity categories have been classified.Consistent across all of these studies, referees cover a greater distance in the first half than in the second half.A number of related explanations have been proposed for these findings, suggesting that the referees are either more fatigued in the second half, that they attempt to conserve energy as the game progresses, that the tempo of the game decreases as players become more fatigued, and/or that it occurs as a function of the increased playing time due to more substitutions in the second half [25].The gradual reduction of high intensity running and backward running amongst referees in the second half [26] provides one explanation, although other studies have failed to find any differences in estimated energy expended [27] or HR [24,28] between the first and second halves.
Clearly, there is a need for more information regarding referees' HR responses and different speeds/types of locomotion to ascertain why they cover less of the pitch in the second half.Furthermore, since HR alone does not account for the physiological workload of the referee [27], and in order to relate these data to the quality of referees' decisions, a multi-disciplinary approach is necessary, measuring both physical performance (e.g.distance covered, amount of high intensity running, and HR) as well as DM performance.
To our knowledge only one study [i.e.35] has attempted to directly relate the locomotor demands of refereeing to their cognitive performance.Verheijen et al. [35] compared examples of correct and incorrect decisions from elite youthleague referees.As in the AR study of Oudejans et al. [11] the referees typically moved more slowly when they made correct decisions, therefore it was recommended that they make decisions when walking, rather than when running.As this interesting finding was obtained from a small sample (n=3) and the referees only participated in 20 minutes of youth league games, it remains unclear whether it generalises to a complete senior match.
In order to address many of the questions raised above our aim was to trial the use of a Global Positioning System (GPS) to collect movement and HR data, together with video recordings to examine both the independent and interactive demands of locomotor and DM performance of a group of national level referees.On the basis of past research, we pre-dicted that referees would cover less ground and perform less high-intensity running in the second half compared to the first half.In addition we anticipate that DM performance will decrease (i.e., referees make more mistakes) as referees become fatigued later in the game.

Participants
Match Officials.Five New Zealand Football Championship (NZFC) referees, officiating in seven NZFC games agreed to participate in the study.Two referees participated twice in games involving different teams.All held the NZ badge qualification and two were Fédération Internationale de Football Association (FIFA) qualified referees.This represents half of the referees who officiate in this league.They were all male, aged from 31-43 yrs old (mean = 38.2yrs, s = 5.89) and all had refereed in the National League for at least 4 years.New Zealand Soccer provided the estimated maximal oxygen uptake ( & VO 2 max ) scores achieved by each refe- ree completing a 12-minute run [see 36] at the beginning of the season (mean = 55.4 ml.kg -1 .min - , s = 5.90).All the referees received a detailed explanation of the purpose of the research and were assured of the confidentiality of their datasets prior to the study commencing.
Referee Panel.A separate panel of five experienced referees also participated by providing independent judgments of selected incidents in the games from edited video clips.Participants were recruited from members of the local referees' society, all of whom officiate alongside the subject sample during New Zealand's winter competitions.All held the NZ badge qualification and one was a FIFA qualified referee.The age and experience of these referees (mean age = 38.2yrs, sd = 12.6; mean experience = 9.5 yrs, sd = 7.1) also closely matched that of the subject sample.These individuals were sent a copy of the DVD and a set of rating sheets and asked to complete the ratings in their own time.The rating sheet included instructions asking them to watch each clip as many times as they felt necessary in order to be confident in their decision and to indicate the number of viewings required on the sheet.They also indicated whether the clip was of sufficient quality and held enough information to make an informed decision and any additional comments they felt necessary to include about the clip.

Procedure
Data were collected at seven home games of Otago United Football Club in the NZ Football Championship (played between November 2005 and February 2006).Each match was recorded by two JVC-2000 DV cameras from elevated positions at the top of the main grandstand and on the opposite side of the pitch.The pitch side camera was manually operated to provide a detailed close-up view of the action, while the grandstand camera, also manually operated, recorded a wide-angle view including both the active play and the position of referee.
Each referee was fitted with a HR monitor and a SPI-10 Global Positioning System (GPSports, Fyshwick, Australia) transmitter 45 minutes before the start of each game.The GPS transmitter was worn in a light harness placed over the shoulders and under the referee's shirt.The GPS equipment collects and stores positional data at a sampling frequency of 1 Hz by comparing signals from between 6 and 9 satellites.The equipment also records, by radio telemetry, the HR signals from a strap worn around the referee's chest.The initiation of video and GPS recording was manually synchronised 30 minutes before the kick-off of each game to ensure that all data had the same timeline.The data recording ended as the referee left the pitch at the end of each game.

GPS Analysis
The GPS transmitter recorded the referee's position, speed of movement, and HR at 1-second intervals throughout each game.A recent assessment of the validity of this GPS system has found a relatively small systematic overestimation of absolute distance (within 4.8% ± 7.2% [37]).From the raw data, a number of other variables related to physical demand were calculated by the GPS 1 equipment's software (such as frequency and distance of sprints/jogs/walks, and percentage of time spent engaging at different exercise intensities).The following locomotor categories were used; standing and walking 0-7 km.h -1 ; jogging 7-12 km.h -1 ; moderate running 12-18 km.h -1 ; and sprinting above 18 km.h - (adjusted from speeds used by Drust, Reilly, & Cable [38]).These four categories were subsequently grouped into two locomotor categories: (1) low intensity activity encompassing all activities below 12km/h; and (2) high intensity running that encompasses all running above 12 km/h.The percentage of maximum HR (%HRmax) was calculated by dividing each participant's raw HR scores with their maximal HR value achieved throughout the game, since maximum HR has been shown to be consistently higher during match observations than during lab based tests [19].

Video Analysis
The videotapes of each match were subsequently analysed by an experimenter (a qualified football association referee) who edited together all the clips where a foul, or potential foul had occurred.These tapes were then reviewed by two other experimenters, one of whom was a qualified referee, to identify referee decisions when contact occurred between opposing players or a potential handball offense occurred.These included fouls and misconduct incidents, with a range of challenges that required the referee to decide if players' had been tripped, kicked, pushed, charged, jumped, or held, or committed a handball offense (as stipulated by the laws of the game, FIFA, 2007 [39]), and they included incidents where the referee apparently missed a decision, as adjudged by the experimenter.We also controlled for the influence of the assistant referees in helping the referee during certain incidents.For example we chose not to analyse offside situations and instead focused on incidents in which the referee alone made the decision.In all except one of the cases (1 of 144) the pitch-side, close-up camera perspective was used to identify tackle and handball incidents.
Incidents from the seven games were professionally edited using Final Cut Pro (version 3.0.4for Mac OSX).Each clip was preceded by a title explaining the clip number and included approximately 5-seconds of 'lead-in' of preceding action to orientate the viewer to the context of the game, and each clip finished approximately 1-second after the incident [cf.15].Thus the clips ranged from about 6 to 10 seconds in length.Immediately after each incident the volume was dubbed out to remove crowd and player reactions.In cases where the match referee was in the frame he was digitally occluded with a black rectangle, again after the incident had occurred, to ensure that the viewers were not able to see or be influenced by his decision, as the panel's role was to adjudicate on the decision, not the match referee's performance.Subsequent analysis of the referee panel's responses on the quality of each clip and their additional comments revealed that this did not affect the panel's decisions or the number of viewings required.At the start of each set of clips for each game a head title came up on screen depicting the start of the game, with a "half-time" title after the first half clips to allow the viewers to orient themselves to the direction of play.
From this editing process 144 foul incidents from the 7 games (approximately 21 per game) were then transferred onto DVD, with each clip indexed so that viewers could easily review each clip by the push of a button.Copies of the DVD were then sent to an independent group of 5 experienced referees.Using a pre-prepared questionnaire this 'expert panel' gave independent decisions for each of the video clips, indicating their decision, the number of viewings required to arrive at this decision and a space to comment on the quality of the clip.We used the number of times that panel members had to view each clip to provide an indirect indication of decision difficulty.It should be acknowledged that it is possible that this difficulty might reflect inadequacies in the video clip rather than the inherent difficulty of assessing the situation.However, none of the clips were reported to be of insufficient quality to be able to make an informed decision.There were also very few critical comments from the panel members about the clip quality and the panel subsequently confirmed that repeated viewings were necessary for more difficult incidents.Therefore it is reasonable to assume that those incidents that were more difficult for the panel members to judge (due to the speed of events, nature of the player contact etc) would also have been more difficult for the match referees.

Statistical Analysis
Various aspects of the GPS and HR data were summarised for each referee and compared between the first and second half with paired sample t-tests.The expert panel questionnaires were collated to identify incidents where a consensus decision had been made.The panel was deemed to have reached agreement when at least 3 of the 5 judges awarded possession to the same team.The uniformity between the judges for each clip was further quantified using correlation.Only the clips in which panel agreement was achieved (n = 127) were submitted to further analysis.This sub-set of incidents was contrasted with the match-day referees' actual decisions.'Accurate' and 'inaccurate' decisions made by the match-day referee were grouped and the GPS and HR data associated with these incidents were compared with independent t-tests.For each dependent variable, the assumption of homogeneity of variance was confirmed prior to any further statistical analyses being conducted.Nonparametric Friedman's analysis of variance was used to compare the differences in decision accuracy in each period of the game.For categorical data (i.e., difficulty of deci-sions) chi-square tests were employed.The level of statistical significance was set at P = 0.05.

Movement and Heart-Rate Analysis
In the present study referees covered 10,323 m on average (s = 486 m) during a game.Whilst the referees appeared to cover more ground in the first half than the second half (see Table 2) this difference was not statistically significant (P > .05).Despite the trend for distance covered being greater in the first half, the second half of games typically lasted longer than the first half (on average by 1 min 29 sec).The referees' average HR during playing time was 163 bpm (s = 8.6 bpm, 84% HRmax), with a higher mean HR in the first half in comparison to the second half (P < .05).In one case, the referee's HR was 15 bpm less in the second half, perhaps as a consequence of this referee sustaining a particularly high HR (175 bpm) in the first half.
The referees performed similar levels of high-intensity running in the first and second half (mean time = 30% vs. 31% respectively).Throughout the game referees spent 65% (s = 5.9) of the time standing or walking, 21% (s = 2.8) jogging, 12% (s = 3.4) moderate running and 2% (s = 1.3) sprinting.In the first half, the referees spent proportionally less time standing and walking and more time jogging than in the second half (see Table 2).In terms of distance covered within each speed zone there were similar distributions in each half (see Fig. 1).

Decision Making Analysis
Expert Panel.The coefficient of correlation between the judges' ratings indicated a high degree of uniformity (range = 0.3 to 0.63; all statistically significant at P < .018).Agreement was achieved on 88% of the clips (127/144), with an average of 4 out of the 5 experts agreeing on each decision.The difficulty of each decision, which was determined from the number of viewings reported by each judge, was evenly distributed across game time (0-15 mins = 1.7 viewings, 15-30 mins = 1.6, 30-45 mins = 1.5, 45-60 mins = 1.7, 60-75 mins = 1.6, 75-90 mins = 1.7).

Accuracy of Decisions.
The match-day referees made the same decision as the panel on 64% of occasions (awarding 81 out of 127 clips correctly).From the occasions where the referee and panel's decision did not concur (n = 46, 36%), 54% (n = 25) arose because the panel saw no infringement and thus the referee erred in penalising the challenge, and 41% (n = 19) were for missed decisions.Therefore, there did not seem to be any strong bias towards over-penalising or under-penalising amongst incorrect decisions.
There was also a reasonable balance between the matchday referees and the expert panel in awarding decisions to either the home or away teams.The match-day referees awarded 45 decisions (35%) in favour of the home team and 39 decisions (31%) to the away team and decided there was no-infringement in 43 cases (34%).This finding could possibly suggest a small refereeing bias towards the home side.However, the expert panel who were arguably less susceptible to influential factors such as the crowd, awarded a similar distribution of decisions, i.e., 44 (35%) should have resulted in home free kicks and 37 (29%) should have been awarded to the away team, and 46 (36%) should have been play-on situations.When referees were inaccurate in their decisions, once more this did not seem to bias one team over another as exactly half of the mistakes were given in favour of the home team (n = 23) and half in favour of the away team (n = 23).
The referees were less accurate in the opening 15 minutes of each half (1 st 15 minutes, mean = 51% correct; 2 nd 15 minutes, mean = 69%; 3 rd 15 minutes, mean = 70%) than they were at any other period (see Fig. 2).The small sample size (n = 7) meant that these differences were not statistically significant.However, the addition of only 4 participants (first half) or 5 (second half) following the same trend would have yielded highly significant (P < .03)differences.
Of the increased errors in the first 15 minutes, 45% (n = 9) were due to penalising when the referee should not have, 20% (n = 4) were for not awarding a home free kick and 35% (n = 7) were for not awarding a free kick to the away team.Thus, there was no clear pattern of over penalising (45%) or under penalising (55%) in the first 15-minute period of each half.Speed of Movement.The referees' accuracy did not vary with the speed of their movement (t (125) = 0.08, P = 0.9, Cohen's d =0.02), as the average speed for correct (mean = 6.9 km/hr, s = 4.3) and incorrect decisions (mean = 7.0 km/hr, s = 5.3) was not significantly different.

Difficulty of Decisions.
There was a strong relationship between incident difficulty and the correctness of the match referees' decisions ( 2 (1, 127) = 15.5, P < .0001).For the 33 most difficult incidents (i.e., viewed most often by the panel) match referees' decisions were only 36% correct compared with 75% correct for the remaining 94 clips.There was no  statistically significant difference between the most difficult decisions and the others in terms of referees' HR, or locomotion.

DISCUSSION
The aim of this study was to examine the DM and locomotor performance of top football referees.In accordance with other investigations [e.g.26,31], football referees covered on average nearly 10.5km during a game, with the majority of the distance covered in the first half.There was no difference in the proportion of high intensity running performed although average heart rates dropped from the first half (mean = 166 bpm, s = 7.4) to the second half (mean = 160 bpm, s = 9.4).Previous research has questioned whether the reduction of physical activity in the second half may be due to referee fatigue or possibly because the tempo of the game decreases as players get tired [e.g.25,27,29,40].Taken together the movement and HR scores presented here lend partial support to both interpretations.For example if locomotor activity was limited by referee fatigue in the second half, one might expect higher HR values.However, the referees had higher mean HRs in the first half than the second half.As player work rates were not monitored in the present study, we are unable to confirm that the tempo of the game decreased in the second half.However, the fact that referees heart rates were lower in the second half than in the first half indicates that the referees were conserving energy as they became more fatigued [see also 26].Further research in which the work-rates of players and officials are measured simultaneously would be necessary to confirm our interpretation of the data.
Castagna and D'Ottavio [29] found that elite Italian referees sprint for 13% of match time, run for 25%, jog for 44%, walk for 9%, and move backwards for 9% of the time.However, the present study found a relatively smaller distribution of sprinting and running activities (2% and 12% respectively) with more jogging and walking/standing (21% and 65% respectively).This discrepancy may be explained partly by the slightly different locomotor categories preferred by Castagna and D'Ottavio (e.g.low-intensity running categorised as < 13km.hr - , whereas sprinting was > 24km.hr - ) but more likely by the increased total distance covered by the Italian referees (approximately 11.5 km per game).Indeed, the intensity of play in the premier league matches played in Europe and New Zealand may be different which has been shown to influence referee work rate [27].
Examination of referees' DM performance revealed figures in line with previous research, with match-day referees achieving on average 64% accuracy from the incidents selected (Fuller et al. [17]: 70% accuracy; Gilis et al. [18]: 60% accuracy).Interestingly, further inspection of the data revealed that referees were on average only 51% accurate in the opening 15 minutes of each half, and 70% accurate at all other times.Our analysis suggests that with only a moderately larger sample (e.g., N = 12 referees) these differences would have been statistically significant.This is an intriguing possibility with important practical applications.
Intuitively one might assume that referees begin the game by "laying out their stall," and refereeing strictly to the letter of the law, an approach that is often propounded by referee associations.However, there was no trend towards either over-penalising or under-penalising during the early period of each half.Alternatively, Adams [41] might attribute such a dip in performance, after a period of rest, to some form of warm-up decrement.Anecdotally, the referees' performed no obvious mental warm-up techniques in association with their physical warm-up immediately prior to each half, thus potentially reducing their initial DM performance levels [41].More likely, these phases of the game present periods of relative instability where teams have yet to settle into established patterns of play, and equally the referee attempts to find appropriate solutions to the game s/he is presented with, by setting boundaries that are synchronous with the game [42].Regardless of the reasons for poorer performance during the opening period of each half, referees need to ensure that they conduct warm ups that combine physical and mental demands to ensure that they are primed for the challenge that the game is likely to present.It is also worth pointing out that the accuracy data presented are probably underestimates for all the decisions a referee makes in a game as only a selection of incidents were analysed in the present study.
Investigation of the balance of both correct and incorrect decisions (i.e., whether the decisions favoured the home or away team) revealed no bias, despite contrary evidence from previous research [e.g.20,21].The match day referee and the experienced panel (who were not susceptible to player or crowd coercion) gave a similar distribution of decisions in favour of the home and away teams.
Although research into referees and ARs has shown that speed of movement can affect decision accuracy [22,35] the current study did not replicate these findings.In the case of referees, the discrepancy might arise due to the different levels of football considered (youth vs. senior) and the fact that our referees participated in the whole match versus 20 minute segments [35].In relation to ARs perhaps the speedaccuracy decrement arose in Oudejans et al's study [11] because the ARs were trying to adjudicate on the relative positioning of players (essentially a perceptual judgment), which became difficult when their own speed increased.Referees are responsible for trying to adjudicate if a foul has occurred or not (i.e., a more complex type of decision drawing upon cognitive judgment) so speed of movement appears to affect these sorts of decisions less.The lack of any simple relationships between DM with speed of movement, HR, and cumulative distance found in the current study indicates that none of these variables in isolation can be used to predict whether a correct or incorrect decision is more likely.Instead, a more complex, multivariate relationship between DM and physical performance is likely to underpin performance in naturalistic environments.
As we predicted the match referees were less accurate as the decisions became more difficult, as objectively indicated by the number of repeated viewings required by the expert panel.Given the increasingly common use of television match officials (TMOs) in sports such as cricket, rugby union and rugby league the present study raises some interesting questions applicable in a number of sports.From a DM perspective, it would be of interest to analyse the accuracy of the video official's decision having watched an incident several times relative to the frequency of accurate decisions made from the first viewing (i.e., are gut instincts most often correct?).Also, video officials are relatively impartial to the nuances of the game such as player and crowd reactions.They are required to make relatively passive judgments from a number of different perspectives compared to the active DM processing of the on-pitch referees who may see and hear information that the video official cannot, and can also account for the context of the game [12].To what extent does this passive presentation of information provide helpful (or conflicting) information from which to inform decisions?Furthermore, does the increasing presence of video officials take some of the control (and players' respect) away from the match-officials?Further research will be necessary to address such issues but for the time-being, it is likely that video officials will remain solely a useful aide to the match official/s for certain types of decisions and sports.
Future research should investigate innovative methods to train referee DM whilst maintaining the naturalistic elements of the task.It is possible that officials could enhance their training by viewing multiple video clips of difficult incidents in the way that the TMO does in certain sports.A similar strategy has been shown to be effective amongst rugby football referees, leading to improved judgment accuracy [see 2].Unfortunately, within the present study we did not collect any data on the subsequent DM performance of those referees who acted as expert judges to ascertain if they benefited from the experience.It would also be worthwhile investigating other training methods that could improve DM.One factor that has been investigated in other performance domains is the time allowed to make each judgment.There is some evidence that time compression in training certain skills (i.e., speeding up the rate at which events occurs) can enhance subsequent performance.This is known as 'above-real-timetraining' and has shown some utility in training pilots and air traffic controllers [43].Similar strategies are used regularly in police and army training, where recruits are progressively placed in stressful real-life situations to 'inoculate' them to extreme DM demands [44].

CONCLUSION
This is the first study to investigate both the DM accuracy and physical performance of football referees officiating in competitive matches.A number of findings reported (e.g.distance covered, HR and DM accuracy) support research conducted with top-level football referees in other countries.From the small sample of referees who participated, there appear to be no clear simple relationships between activity levels and DM performance.For example, it was anticipated that as referees fatigue their DM performance might deteriorate.However, there were no significant differences between variables such as the cumulative distance covered or HR, and the quality of decisions.These findings should not be construed as implying that the DM and physical performance of the referee are independent.Instead we argue that these processes are intricately connected but their complex relationship is heavily influenced by a number of other factors (such as situational context), which remain to be studied together.Whilst the referees levels of accuracy may seem low at 64% it should be pointed out that only a selection of incidents from the game were analyzed.In fact, the level of decision accuracy found in the present study was exactly in line with those found in previous studies [17,18].
Given such findings and the relative ease of GPS data collection, this investigation highlights the value of such technology as a valuable tool for sports scientists to investigate athletic performance.With further developments in GPS miniaturization, this technology may also present another practical solution to measuring referee and player locomotion in their naturalistic environment without compromising their safety or performance.

Fig. ( 2 ).
Fig. (2).Proportion of incorrect decisions made by match-day referee at different phases of the game (error bars indicate standard deviation amongst referees).