First Observation of Electroweak Single Top Quark Production

We report the first observation of single top quark production using 3.2 fb^-1 of pbar p collision data with sqrt{s}=1.96 TeV collected by the Collider Detector at Fermilab. The significance of the observed data is 5.0 standard deviations, and the expected sensitivity for standard model production and decay is in excess of 5.9 standard deviations. Assuming m_t=175 GeV/c^2, we measure a cross section of 2.3 +0.6 -0.5 (stat+syst) pb, extract the CKM matrix element value |V_{tb}|=0.91 +-0.11 (stat+syst) 0.07(theory), and set the limit |V_{tb}|>0.71 at the 95% C.L.

In the standard model (SM), top quarks are expected to be produced singly in pp collisions through s-channel or t-channel exchange of a virtual W boson [1]. The reasons for studying single top quarks are compelling: the production cross section is directly proportional to the square of the CKM matrix [2] element |V tb |, and thus a measurement of the rate constrains fourth-generation models, models with flavor-changing neutral currents, and other new phenomena [3]. Electroweak production of single top quarks is a difficult process to measure because the expected expected production cross section for the combined s-and t-channels (σ st ∼ 2.9 pb [4,5]) is much smaller than those of competing background processes, and it is also smaller than the uncertainty on the total background rate. The presence of only one top quark in the event provides fewer features to use in separating the signal from background, compared with measurements of top pair production (tt), which was first observed in 1995 [6].
To overcome these challenges, a variety of multivariate techniques for separating single top events from the backgrounds have been developed. Using different combinations of techniques, both the CDF and D0 collaborations have published evidence for single top quark production at significance levels of 3.7 and 3.6 standard deviations, respectively [7,8]. The analysis described in this Letter supersedes that of Ref. [7] and achieves a significantly improved sensitivity by including a larger data sample and by adding three new analyses. We report a signal significance of 5.0 standard deviations, thus conclusively observing electroweak production of single top quarks, and we make the most precise measurement of |V tb | to date.
We assume that single top quarks are produced in the sand t-channel modes with the SM ratio, and that the branching ratio of the top quark to W b is 100%. We seek events in which the W boson decays leptonically in order to improve the signal-to-background ratio s/b. We simulate single top events using the tree-level matrixelement generator madevent [9]. The t-channel signal is modeled by the two processes qb → q ′ t and qg → q ′ tb, which are combined to match the event kinematics predicted by a fully differential NLO calculation [5,10].
A total of six analyses are combined to yield the final results reported here. The likelihood function (LF), matrix element (ME), and neural network (NN) analyses of [7] are re-used with an additional 1 fb −1 of integrated luminosity; their methods remain unchanged. The three new analyses introduced here are: a boosted decision tree (BDT), a likelihood function optimized for s-channel single top production (LFS), and a neural-network-based analysis of events with missing transverse energy E T [11] and jets (MJ). The BDT and LFS analyses use events that overlap with the LF, ME, and NN analyses, while the MJ analysis uses an orthogonal event selection that adds about 30% to the signal acceptance. This paper concentrates on the three new analyses and their combination with the analyses of [7] using 3.2 fb −1 of integrated luminosity collected with the CDF II detector [12].
For the LF, ME, NN, BDT, and LFS analyses we select ℓ + E T + jet events as described in [7], where ℓ is an explicitly reconstructed electron or muon from the W boson decay and at least one jet is identified as containing a B hadron. The background has contributions from events in which a W boson is produced in association with one or more heavy-flavor jets (W +HF ), events with mistakenly b-tagged light-flavor jets (mistags), multijet events (QCD), tt and diboson processes, as well as Z+jet events. The expected event yields in Table I are estimated as in [7] where the signal, tt, and diboson categories are Monte Carlo (MC) predictions scaled to the total integrated luminosity while the remaining categories use predictions derived from data control samples. The uncertainties quoted in Table I include theoretical uncertainties, the luminosity uncertainty for the MC predictions, and experimental uncertainties for the data-driven background normalizations.
The MJ analysis is designed to select events with E T and jets and to veto events selected by the ℓ + E T +jet analyses. It accepts events in which the W boson decays into τ leptons and those in which the electron or muon fails the lepton identification criteria. We use data corresponding to 2.1 fb −1 of integrated luminosity for the MJ analysis and select events that have E T > 50 GeV and two jets within |η| < 2.0, at least one of which has |η| < 0.9. The jet energy measurements include information from both the calorimeter and the charged-particle spectrometer. Events must have one jet with transverse energy E T greater than 35 GeV, and a second jet with E T greater than 25 GeV. The angular separation between the two jets, ∆R = (∆η) 2 + (∆φ) 2 , is required to exceed 1.0. We reject events with four or more jets with E T > 15 GeV in |η| < 2.4 in order to reduce the multijet (QCD) and tt backgrounds. We identify b jets with the same algorithm used in [7] supplemented with a jet probability algorithm [13].
The primary background in the MJ analysis is QCD events in which mismeasured jet energies produce large E T aligned in the same direction as jets. To reduce this background, we use the transverse momentum imbalance ( p T ) as measured in the spectrometer. This variable is more correlated to the neutrino energy and its direction than E T in this class of events. The absolute amount of E T and p T , the angle between them, the azimuthal angles between E T or p T and the jet directions, and several other less powerful variables are used as inputs to a neural network (NNQCD). The NNQCD output is required to pass a threshold, removing 77% of the QCD background while keeping 91% of the signal acceptance.
The backgrounds in the MJ analysis due to QCD events and events with light-flavor jets produced in association with W and Z bosons are estimated using data in a control region composed of events in which the E T is aligned with one of the jets. The observed and expected event counts for the MJ analysis are given in the E T +jets column of Table I. After event selection, the samples are dominated by background. We further discriminate the signal with multivariate techniques. Each multivariate technique defines a function which reduces several reconstructed quantities for each event into a single output variable whose distribution can be studied and fit to extract signal and background contributions. Validation of the background modeling for the input variables and output distributions is a crucial step in the use of multivariate techniques. We first describe the construction of our multivariate tools and then the checks we used to prove the validity of our background model. The LF, ME, and NN discriminants are described in [7]. The BDT discriminant uses a decision tree method that applies binary cuts iteratively to classify events [14]. The discrimination is further improved using a boosting algorithm [15,16]. The BDT discriminant uses over 20 input variables. Some of the most sensitive are the neural-network jet-flavor separator [17], the invariant mass of the ℓνb system M ℓνb , the total scalar sum of transverse energy in the event H T , Q × η [18], the dijet mass M jj , and the transverse mass of the W boson.
The LFS discriminant uses projective likelihood functions [19] to combine the separation power of several variables and is optimized to be sensitive to the s-channel process. The subset of the ℓ+ E T +jets sample with two b-tagged jets is used and consists of 609 events. The dominant backgrounds are W + HF and tt production. A kinematic fitter is used to find the most likely resolution of two ambiguities: the z-component of the neutrino momentum and the b jet that most likely came from the top quark decay. In addition to the outputs of the kinematic fitter, other important inputs to the likelihood are the invariant mass of the two b-tagged jets M bb , the transverse momentum of the bb system, the leading jet transverse momentum, M ℓνb , H T , and E T .
The MJ discriminant uses a neural network to combine information from several input variables. The most important variables are the invariant mass of the E T and the second leading jet, the scalar sum of the jet energies, the E T , and the azimuthal angle between the E T and the jets.
We combine the LF, ME, NN, BDT, and LFS channels using a super-discriminant (SD) technique similar to that which was applied in [7]. The SD method uses a neural network trained with neuro-evolution [20] to separate the signal from the background taking as inputs the discriminant outputs of the five analyses for each event.
With the super-discriminant analysis we improve the sensitivity (defined below) by 13% over the best individual analysis. We perform a simultaneous fit over the two exclusive channels, MJ and SD, to obtain the final combined results.
Before investigating the sample of selected events, we used background-dominated data control samples to check the modeling of each input variable as well as the output distributions of each multivariate discriminant. For the ℓ + E T + jets analyses the control samples used are the lepton + b-tagged four-jet sample, which is enriched in tt events, and the two-and three-jet samples in which there is no b-tagged jet. The latter are enriched in W +jets and QCD events with kinematics similar to the btagged signal samples and have high statistics, making it possible to observe that the background model describes the data well over three orders of magnitude in our output discriminants. For the MJ analysis, three control samples are used: in the first sample, the E T is required to be aligned along one of the jets, and in the second, the events are required to fail the NNQCD requirement, and in the third, a lepton is required to be present. The data distributions in all control samples are described well by our models for each of the analysis input variables and a large set of other variables not used as inputs. More than two thousand distributions were checked for evidence of mismodeling. Small discrepancies were found in the distributions of the angles between two jets in the untagged lepton + two-jet sample and the modeling of jets with rapidity greater than 2.4. These effects are included as systematic uncertainties on the shape of the background models. Figure 1 shows the distributions of the five ℓ + E T + jets discriminants. These are combined to give the SD distribution shown in Fig. 2 together with the MJ distribution. In the rightmost bins, assuming SM production and decay, the SD has an s/b that exceeds 5.0. This large s/b significantly reduces our sensitivity to systematic uncertainties affecting the background. We use the distributions of the SD and MJ discriminants to extract the measured cross section and the signal significance.
We measure the single top cross section using a Bayesian binned likelihood technique [21] assuming a flat prior in the cross section and integrating the posterior over all sources of systematic uncertainty. The background rates are varied within uncertainties, but are largely constrained by the data in the backgroundenriched portions of the SD and MJ discriminant distributions. Uncertainties on the shapes of these distributions degrade the extrapolation of these constraints to more signal-like regions. The sources of systematic uncertainties affecting these shapes are discussed below and are also included in all calculations. The uncertainties assigned were conservatively chosen to cover the full range of variations studied. We quote the measured cross section as the value that maximizes the posterior likelihood, and use the shortest interval containing 68% of the integral of the posterior to set the uncertainties. We calculate the significance as a p-value [21], which is the probability, assuming single top quark production is absent, that −2 ln Q = −2 ln (p(data|s + b)/p(data|b)) is less than that observed in the data. Figure 2(c) shows the distributions of −2 ln Q in pseudoexperiments that assume SM single top (S + B) and also those that assume single top production is absent (B), along with the value observed in data. The effects of the systematic uncertainties are included in the pseudoexperiments. We convert the observed p-value into a number of standard deviations using the integral of one side of a Gaussian function.
All sources of systematic uncertainty are included and correlations between normalization and discriminant shape changes are considered. Uncertainties in the jet energy scale, b-tagging efficiencies, lepton identification and trigger efficiencies, the amount of initial and final state radiation, parton distribution functions, factorization and renormalization scale, and background model-ing have been explored and incorporated in all individual analyses and the combination. We include uncorrelated MC statistical uncertainties in each bin of each discriminant distribution. A ±2.5 GeV/c 2 uncertainty on the top quark mass m t is included in the significance and |V tb | results but quote the dependence on m t separately in the cross section. Table II lists the measured cross sections and significances for each of the component analyses and the combination. The measured cross sections for the five correlated analyses and the SD are close to each other even though the analyses choose different input variables and are optimized differently. We interpret the excess of signal-like events over the expected background as observation of single top production with a p-value of 3.10 × 10 −7 , corresponding to a signal significance of 5.0 standard deviations. The sensitivity is defined to be the median expected significance and is in excess of 5.9 standard deviations, assuming the SM signal cross section. The most probable value of the combined schannel and t-channel cross section is 2.3 +0. 6 −0.5 pb assuming a top quark mass of 175 GeV/c 2 . The dependence on the top quark mass is +0.02 pb/(GeV/c 2 ). From the cross section measurement at m t = 175 GeV/c 2 , we obtain |V tb | = 0.91 ± 0.11(stat + syst) ± 0.07(theory [4]) and limit |V tb | > 0.71 at the 95% C.L. assuming a flat prior in |V tb | 2 from 0 to 1. This is the most precise direct measurement of |V tb | to date.
In summary, we combine six multivariate analysis techniques to precisely measure the electroweak single top production cross section and the CKM matrix element |V tb |. We have carefully cross-checked our analysis techniques with data control samples and we assign generous rate and shape uncertainties to all predictions we use. Our combined discriminant allows us to purify a signal