JRHS 2015; 15(2): 77-82

Copyright© Journal of Research in Health Sciences

Regression Dilution Bias in Blood Pressure and Body Mass Index in a Longitudinal Population-Based Cohort Study

Sima Masudi (PhD)a, Parvin Yavari (PhD)b*, Yadollah Mehrabi (PhD)a, Davood Khalili (PhD)c, Fereidoun Azizi (MD)d

a Department of Epidemiology, School of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran

b Department of Health and Community Medicine, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran

c Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Department of Epidemiology, School of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran

d Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

* Correspondence: Parvin Yavari (PhD),E-mail1: parvinyavari@yahoo.com , E-mail2: P.yavari-grc@sbmu.ac.ir

Received: 18 April 2015, Revised: 31 May 2015, Accepted: 20 June 2015, Available online: 24 June 2015


Background: Use of single measurement of risk factors can distort their estimated effects, due to random error in measurements. The aim of this study was to examine the extent of underestimation in the estimated effect of common variables in physical exam i.e. systolic and diastolic blood pressure (SBP, DBP) and body mass index (BMI) on cardiovascular diseases in Tehran Lipid and Glucose Study (TLGS).

Methods: A subsample (1167 men and 1786 women) of the original cohort, who had replicate measures of the variables in triennial interval, was used to calculate the regression dilution ratios (RDRs) in men and women. RDRs were determined by parametric and nonparametric methods. Hazard ratios (HR) of risk factors, per one standard deviation change, were corrected for regression dilution bias.

Results: The estimated RDRs by parametric method in men and women were 45% and 35% for SBP and 54% and 64% for DBP, respectively. There were 26% and 25% underestimation in HR of SBP and 23% and 33% in HR of DBP in men and women. The corresponding underestimation for BMI was about 8%. RDRs of men and women and in age groups by both methods were fairly similar. They were relatively constant during the 10-year follow-up for SBP and BMI.

Conclusions: Using baseline measurements of blood pressure underestimate its real association with CVD events and the estimated HRs. The underestimations are independent of age and sex, and it can be fairly constant in short to moderate time intervals.

Keywords: Regression dilution bias, Systolic blood pressure, Diastolic blood pressure, Body mass index, Cardiovascular disease


Blood pressure and body mass index (BMI) are two commonly measured risk factors by physical exam in the most epidemiologic studies. Blood pressure is liable to random measurement error due to the technique and instrument used in the measurement process, observer skill in accurate reading and recording, and true biological variation in individuals. BMI (combined measure consists of height and weight) has less variability than other risk factors 1, 2; because technical error is the major source of variation in its measurements. However, in long term cohort studies with longer interval between replicate measures, weight gain or loss can affect the magnitude of BMI.     

Due to random error in measurements, the true values of risk factors are unobservable3. True value of variables is defined as the average of a large number of measurements, in a long period of time4. Using a single measurement of a predictor will bias its estimated effect toward the null value 5-7, a phenomenon that is known as regression dilution bias (RDB) 8. It is the result of tendency for extreme values in a single measurement to follow by less extreme ones in replication, which is called regression to the mean 9. Therefore, the findings of the studies that do not take into account the RDB have some degrees of inaccuracies, a problem that is often overlooked.

In the last two decades, correction for the effects of random measurement error has been considered in prospective cohort studies and different methods have been proposed in the context of validation or reliability studies. Repeated measures of variables can be used to correct the underestimation 10. The widely accepted method is the use of regression dilution ratio (RDR) 8, 11, a factor that shows the extent of attenuation in regression coefficient for a given risk factor-disease association. RDR is the ratio of observed slope to the true underlying slop and is equal to the proportion of between person variance to the total variance (reliability ratio) 10.

Parametric and nonparametric methods have been proposed to calculate the RDR. The nonparametric method was proposed by MacMahon in 1990 8 for the estimation of the effect of DBP on stroke. This method has no assumption on the distribution of data and the shape of exposure-outcome relationship. A number of parametric methods have been proposed to correct the effect of RDB based on regression or correlation 10. These methods will give valid results if their assumptions are met. Among parametric methods, Rosners regression method has less strict assumptions. This method can be used in situations where the means of repeated measures are not equal, which is very likely in cohort studies with long interval between measurements.

The aim of the present study was to determine the RDR for SBP, DBP and BMI in the Tehran Lipid and Glucose Study (TLGS), a community based cohort study, designed to determine the prevalence of non-communicable diseases risk factors and their relationships with cardiovascular events in Tehran, Iran 12.


Study population

Participants of TLGS were selected by a multistage cluster sampling from residents of district 13 of Tehran 12. Of 15005 individuals aged 3 years or more who participated in the baseline of study (showed here as exam1) between February 1999 and August 2001, 7907 were 30-74 years old, of whom 487 had a history of cardiovascular disease, 332 had missing data and 758 were lost to follow-up for annual outcome measurement, so the data of 6327 subjects (2,705 men and 3622 women) remained for the present analysis.

Every three years, participants of TLGS were invited to complete a questionnaire and underwent medical examinations and biochemical tests. Since then, three follow-up examinations were completed on average 3, 6 and 9 years after the baseline examination (exam2, exam3 and exam4, respectively). Of 6327 participants, 3545 subjects had completed 3 reexaminations until October 2011. We excluded people with missing data in at least one of the exams (n=384) or had cardiovascular event before entering exam4 (n=208). Finally 3063 subjects (1167 men and 1786 women) were remained to calculate the RDRs.

Written informed consent was obtained from all participants, and the study was approved by the Ethics Committee of the Research Institute for Endocrine Sciences.


In TLGS blood pressure was measured twice in the right arm 5 minutes apart, in a sitting position. We used the average of the two readings as individual's blood pressure. A digital scale and a tape meter were used to measure weight and height respectively. Measurements were done with minimal clothing and without shoes, in standing position with shoulders in normal state. The details of examinations at baseline have been described before 13. BMI calculated as weight (kg) divided by the square of height (m).

Statistical methods or analysis

RDRs were calculated by MacMahons nonparametric method and Rosners regression method 8, 10. To obtain RDRs by parametric method (Rosners method), repeated measures of variables were regressed on the baseline measures. Confidence interval for RDR was computed using standard deviation for regression coefficient based on the formula (var(βreg))≈√((1-(1/λ^2))/n), 10, where, λ was the reciprocal of RDR, n is the number of participants and βreg is the regression coefficient of linear regression model that obtained by regressing replicate measure of the variable of interest to the baseline measurement of it .

For MacMahons nonparametric method, the variables were divided into quintile groups based on their baseline measurements. Group means and the differences of means for upper and lower groups (which is called here as mean range) were obtained for all examinations. RDRs were calculated by the ratio of mean range of each re-examination to the mean range of the baseline examination 8, 10.

Cox proportional hazard ratios, per 1 standard deviation change in variables in relation to the incidence of cardiovascular disease, were corrected using RDR1 to show the effect of correction for regression dilution bias. We computed the percent change in hazard ratio after correction for RDB using the formula: ((HRC-HRU)/HRU)*100. All statistical analyses were done by SPSS 20 (Chicago, IL, USA) and excel 2007.


About 43% of men and 49% of women in the cohort had completed 3 re-examinations. The mean age (SD) of men and women in the whole cohort were 47.5 (12.3) and 46.3 (11.4) years and for whom in the subsample were 46.9 (11.9) and 45.3 (10.5) years, respectively.

Table 1 shows the mean (SD) of SBP, DBP, and BMI for participants in all examinations of the TLGS. There was little change in mean of SBP and DBP during the follow-up time. SBP in both genders and DBP in men had small increase after nine years from baseline. BMI increased mildly during the follow-up. In men it reached from 26.4 to 27.4 and in women from 28.4 to 30.2.

Table 2 shows the changes in SBP, DBP, and BMI in five similar sized groups based on baseline measurement. Despite the small differences among the overall means of SBP and DBP over time in TLGS, there were more changes in the mean values of these 5 groups, especially in upper and lower groups from exam1 to exam2. For example, mean of SBP in upper group decreased from 147.7 to 140.3 in men and from 149.5 to 139.9 in women. In contrast to upper groups, lower groups mean increased mildly from exam1 to exam4; it shows the regression to the mean phenomenon. There were mild increments in the group means of BMI from baseline to exam4.

The difference of means in upper and lower quintiles of SBP, DBP, and BMI in both genders showed sharp decrease from exam1 to exam2 and modest decline in later examinations (Table 2). For example, the mean range of SBP in women declined from 50.5 in eaxm1 to 37.0, 35.2 and 33.7 in exam2, exam3 and exam4 respectively. BMI had mild declines in the mean ranges over examinations for both genders.  

Table 3 shows the parametric and nonparametric estimates of RDRs. Both methods resulted in fairly similar estimates of RDRs for each variable in each exam. Furthermore, RDRs in each exam by both parametric and nonparametric methods were similar for men and women. For example, RDR1s for SBP by the two methods were 0.71 and 0.69 in men and 0.73 and 0.74 in women; all of them were about 0.7. RDR2 by parametric method for DBP was 0.53 in men and 0.50 in women; both were about 0.5.

We categorized participants based on their age at baseline into three groups (30-44, 45-59, and 60-74 years) and recalculated RDRs by parametric method (results not shown here). Despite some differences in estimated RDRs for age groups in each exam, there were no clear trend of increase or decrease across the age groups, and 95% confidence intervals (CI) for RDRs had considerable overlap.

The estimated RDRs for SBP and BMI were fairly constant over the study period by both methods.  RDR1, RDR2 and RDR3 for SBP were 0.69, 0.72, and 0.67 in men and 0.74, 0.68, and 0.66 in women; all of them were about 0.7, which means about 40% (1-1/RDR) underestimation in the real association for SBP. RDRs for BMI in men and women were all about 0.9, which implies 10% underestimation in the estimated effects for these variables, respectively.   

For DBP, RDR2 and RDR3 were similar and smaller than RDR1. In general, DBP showed the greatest within-person variability compared to SBP and BMI. RDR1 for SBP indicated 54% and 64% underestimation in the estimated effect in men and women, respectively. There were 80% and 90% underestimations in the estimated effect of DBP in men and women based on RDR2 and RDR3.

We restricted the analysis to participants that they did not take any anti-hypertensive drugs and found similar RDRs for SBP and DBP in all exams (data not shown).

Table 4 shows uncorrected and corrected hazard ratios per one standard deviation changes in SBP, DBP and BMI. Due to small amount of random error in BMI, corrected HRs did not change very much. Correction for regression dilution bias widened the 95%CI of corrected HRs, so that the corrected HRs for BMI were not significant.

Correction was made based on correction factor obtained by using exam2 data.

Table 1: Mean (SD) of repeated measures of SBP, DBP, and BMI in TLGS

Table 2: Group a means of SBP, DBP, and BMI and the difference between upper and lower groups

Table 3: Nonparametric and parametric estimates of regression dilution ratios (RDR) for systolic blood pressure, diastolic blood pressure, and body mass index for men and women 3, 6, and 9 years after baseline

Table 4: Corrected a and uncorrected hazard ratios for 10 years CVD events in men and women for one standard deviation b change in systolic blood pressure (SBP), diastolic blood pressure (DBP), and body mass index (BMI)


We determined the extent of underestimation in the real association of SBP, DBP, and BMI in relation to cardiovascular diseases due to regression dilution bias. Correction for regression dilution bias increased the estimated effect size (hazard ratio here), in some degrees depend on the magnitude of RDR and the magnitude of uncorrected HR.   

Underestimation for SBP was about 45% in men and 35% in women based on replicate measures 3 years after baseline. For DBP in men and women, it was 54% and 64%, respectively. It was about 8% for BMI, in both genders. These findings revealed that among these variables, DBP and BMI had the greatest and lowest within-person variability, respectively. The lower underestimation for BMI can be due to the fact that technical error is the major source of error in measuring BMI. For DBP, in addition to biologic variation, difficulty in hearing the fifth Korotkoff sound might be a major source of observed within-person variability. Reported RDRs for these variables in other studies were different in some degrees. In the studies by Whitlock et al. 1 and Knuiman et al. 14, underestimation in the association for SBP was about 53%, 80% respectively. Underestimations for DBP were 67%-167% in three cities in Europe in three consecutive annual examinations 15 and 51% and 67% in Framingham study, 2 and 4 years after baseline 8. Underestimations for BMI in Framingham study were 3% after 6 and 26 years and it was 20% after 26 years in Whitehall study 11. In a recent study by Wormser and White, 2 it was 4%. The differences among these studies might be due to several factors such as the time interval between replicate measurements from baseline of the study and the extent of random error in the observed values of risk factors in different studies.

We calculated RDRs by parametric and nonparametric methods. The nonparametric method is based on the ratio of mean ranges and it has no assumption on the shape of data that makes it suitable for studies with long intervals between replicate and baseline measurements 11. In this method, the number of groups and their boundaries are arbitrary. The number of groups and their boundaries have no effect on the estimated RDRs 11. However, this method utilizes the phenomenon of regression to the mean, in which the more extreme the chosen boundaries of groups, the greater the regression to the mean effect 16. In addition, using the more extreme boundaries will cause the small number of participants in extreme groups. In this study, we used quintiles that provided similar sizes of participants in each groups. Furthermore, we used Rosners regression method that provides valid estimates of RDRs when the means of replicate measures are not equal. When the variances of repeated measures are equal, the two methods give similar results 11. In the present study, the estimated RDRs were similar by both methods.

We found similar RDRs for men and women. We also calculated RDRs for three age groups of participants based on their age at the time of entering the study. There was no specific increasing or decreasing trend among age groups, and considerable overlaps between 95%CIs implied no significant differences among them. They were similar to the findings of Clarke et al. 11. These results demonstrated that the estimated RDRs were independent of age and gender.

The mean range of replicate measures decreases with the time interval from baseline that suggests the rising importance of the effects of within-person variability during the time 8, 11, 17. Similarly, RDRs decrease over time and the extent of the reduction depends on the length of the time from baseline measurements 10, 11, 14, 17. In our study there was great difference between mean ranges of exam2 and baseline. But mean ranges for the measurements of 6 and 9 years after baseline were similar, which resulted in close RDRs for these measurements. Similar closeness in RDRs was observed by parametric method. Close RDRs for SBP and especially BMI for the three replicate measures, and substantial overlap among 95%CI reveals that the RDRs of these variables were relatively constant over the 10-year follow-up. It can be the result of time intervals between replicate measures in our study that were on average 3 years. Clarke et al. 11 estimated RDRs of SBP, DBP and CHOL for 6, 16, and 26 years after baseline in the Framingham study. For example, RDRs of SBP for aforesaid time intervals in their study were 0.63, 0.45 and 0.31 respectively. RDRs for SBP were 0.72 and 0.70 and 0.67 for measurements of 2, 4, and 6 years after baseline in Framingham study 17. However, in our study RDRs of 6 and 9 years after baseline for DBP showed greater differences with RDRs of 3 years after baseline. The underestimation in the association for the measurements 6 and 9 years after baseline was 80% in men and 90% in women whereas they were 54% and 64% for the measurements of 3 years after baseline.

In this study, RDR of SBP in men based on replicate measures 6 years after baseline was greater than RDR of 3 years after. Group means of exam3 for SBP in men increased in all groups compared to the second measurement that caused larger mean range. It might decline the within-person variability in the third measurement. The decline in the SBP and DBP of men and women during the follow-up can be the result of entering in the study and an increase of participants awareness about their blood pressure level that cause people seek the treatment and manage their blood pressure level. We noticed that the proportion of men and women who took antihypertensive drugs increased in exam2. Although, in women the mean of SBP was fairly constant in exam3; but it showed an increase in men during that time. We found similar RDRs after excluding the participants who took antihypertensive drugs in the baseline or during the follow-up (data not shown). In our study, generally, the means of variables in exam4 were greater than the means of baseline except for DBP in women which were relatively constant. BMI of the participants increased progressively during the follow-up period. The increase in weight and age of the participants might explain some of the increase of their blood pressure in exam4, but the measurements of 3 and 6 years after baseline showed some changes. Comparison of these changes with regard to the increase in age and BMI make the interpretation of the observed differences difficult. It is very hard to specify whether these differences are the result of real changes in these variables over time or they are due to random or systematic errors in replicate measurements.

This study may have some flaws. The subsample used to calculate RDRs should be representative of the all participants. In our study the means of variables in sub-sample were similar to the means of all participants in the original cohort. The mean ages were not very different and the proportion of people who took antihypertensive drugs was similar in sub-sample and original cohort.

We corrected the hazard ratios for these variables to show the effect of correction for regression dilution bias on the estimated effect size. We assumed that other variables were measured without error. The existence of random measurement error in one variable can affect the effects of other variables that are measured without error. In this case, if the study is aimed to obtain the real associations for all variables in the model, suitable correction methods, such as regression calibration or SIMEX 5, should be used. Moreover, when one variable is measured with error, the direction of the bias is toward the null vale, but in the situations with more than one variable prone to random error the bias can be in either side, and the use of simple methods for correction is not suitable 7, 18, 19.

Our results revealed the existence of underestimations in the results of the previous studies published from TLGS data. For example, the hazard ratio of SBP for CVDs in the TLGS was lower than that in the Framingham study (3.5 vs. 16.8 for one unite increase of Ln (SBP) in mmHg)20; RDB might explain some of the difference; although, any correction for RDB has not been considered in the results of Framingham study likewise. The underestimation for the effect of hypertension on CVD might affect the fraction of cardiovascular risk attributed to hypertension in population level 21, 22. Moreover, RDB might influence the results of a factor analysis including blood pressure measurements 23.


We found an underestimation in the real association about 40% for SBP, 60-90% for DBP, and 10% for BMI in TLGS. This demonstrates that the effect of blood pressure on the occurrence of diseases including CVDs is stronger than the effects obtained in previous studies without correction for RDB. The underestimations were similar for men and women, and they were fairly constant during the 10-year follow-up in this study.


This work was derived from S. Masudis PhD thesis in Department of Epidemiology, School of Public Health, Shahid Beheshti University of Medical Sciences and was supported by Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences.

Conflict of interest statement

The authors declare no conflict of interest.


  1. Whitlock G, Clark T, Vander Hoorn S, Rodgers A, Jackson R, Norton R. Random errors in the measurement of 10 cardiovascular risk factors. Eur J Epidemiol. 2001;17(10):907-909.
  2. Wormser D, White IR, Thompson SG, Wood AM. Within-person variability in calculated risk factors: Comparing the aetiological association of adiposity ratios with risk of coronary heart disease. Int J Epidemiol. 2013;43:849-859.
  3. Zhang X, Tomblin JB. Explaining and Controlling Regression to the Mean in Longitudinal Research Designs. J Speech Lang Hear Res. 2003;46:1340-1351.
  4. Bland JM, Altman DG. Measurement error. BMJ. 1996;313(7059):744.
  5. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. 2nd ed. Boca Raton: Taylor and Francis Group; 2006.
  6. Fuller WA. Measurement error models. New York: Wiley; 1987.
  7. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error.(abstract). Stat Med. 1989;8(9):1051-1069.
  8. MacMahon S1, Peto R, Cutler J, Collins R, Sorlie P, Neaton J, Abbott R. Blood pressure, stroke, and coronary heart disease. Part 1, Prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. Lancet. 1990;335(8692):765-774.
  9. Bland JM, Altman DG. Regression towards the mean. BMJ. 1994;308:1499.
  10. Frost C, Thompson SG. Correcting for regression dilution bias: comparison of methods for a single predictor variable. J Royal Statistic Soc: Series A. 2000;163:173-189.
  11. Clarke R1, Shipley M, Lewington S, Youngman L, Collins R, Marmot M. Underestimation of risk associations due to regression dilution in longterm follow-up of prospective studies. Am J Epidemiol. 1999;150(4):341-353.
  12. Azizi F, Rahmani M, Emami H, Madjid M. Tehran lipid and glucose study: rationale and design.. CVD Prevention. 2000;3:242-247.
  13. Azizi F1, Rahmani M, Emami H, Mirmiran P, Hajipour R, Madjid M. Cardiovascular risk factors in an Iranian urban population: Tehran Lipid and Glucose Study (phase 1). Soz Praventivmed. 2002;47(6):408426.
  14. Knuiman MW, Divitini ML, Buzas JS, Fitzgerald PEB. Adjustment for regression dilution in epidemiological regression analyses. Ann Epidemiol. 1998;8:56-63.
  15. Hughes MD, Pocock SJ. Within-subject diastolic blood pressure variability: implications for risk assessment and screening. J Clin Epidemiol. 1992;45(9):985-998.
  16. Davis CE. The effect of regression to the mean in epidemiologic and clinical studies. Am J Epidemiol. 1976;104(5):493-498.
  17. Clarke R, Lewington S, Youngman L, Sherliker P, Peto R, Collins R. Underestimation of the importance of blood pressure and cholesterol for coronary heart disease mortality in old age. Eur Heart J. 2002;23(4):286-293.
  18. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol. 1990;132(4):734-745.
  19. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol. 1992;136(11):1400-1413.
  20. Khalili D, Hadaegh F, Soori H, Steyerberg EW, Bozorgmanesh M, Azizi F. Clinical usefulness of the Framingham cardiovascular risk profile beyond its statistical performance: the Tehran Lipid and Glucose Study. Am J Epidemiol. 2012;176(3):177-186.
  21. Bozorgmanesh M, Hadaegh F, Mohebi R, Ghanbarian A, Eskandari F, Azizi F. Diabetic population mortality and cardiovascular risk attributable to hypertension: a decade follow-up from the Tehran Lipid and Glucose Study. Blood Press. 2013;22(5):317-324.
  22. Khalili D, Sheikholeslami FH, Bakhtiyari M, Azizi F, Momenan AA, Hadaegh F. The incidence of coronary heart disease and the population attributable fraction of its risk factors in Tehran: a 10-year population-based cohort study. PLoS One. 2014;9(8):e105804.
  23. Bahar A, Hosseini Esfahani F, Asghari Jafarabadi M, Mehrabi Y, Azizi F. The structure of metabolic syndrome components across follow-up survey from childhood to adolescence. Int J Endocrinol Metab. 2013;11(1):16-22.

JRHS Office:

School of Public Health, Hamadan University of Medical Sciences, Shaheed Fahmideh Ave. Hamadan, Islamic Republic of Iran

Postal code: 6517838695, PO box: 65175-4171

Tel: +98 81 38380292, Fax: +98 81 38380509

E-mail: jrhs@umsha.ac.ir