Non-radiographic methods of measuring global sagittal balance: a systematic review

Background Global sagittal balance, describing the vertical alignment of the spine, is an important factor in the non-operative and operative management of back pain. However, the typical gold standard method of assessment, radiography, requires exposure to radiation and increased cost, making it unsuitable for repeated use. Non-radiologic methods of assessment are available, but their reliability and validity in the current literature have not been systematically assessed. Therefore, the aim of this systematic review was to synthesise and evaluate the reliability and validity of non-radiographic methods of assessing global sagittal balance. Methods Five electronic databases were searched and methodology evaluated by two independent reviewers using the13-item, reliability and validity, Brink and Louw critical appraisal tool. Results Fourteen articles describing six methodologies were identified from 3940 records. The six non-radiographic methodologies were biophotogrammetry, plumbline, surface topography, infra-red motion analysis, spinal mouse and ultrasound. Construct validity was evaluated for surface topography (R = 0.49 and R = 0.68, p < 0.001), infra-red motion-analysis (ICC = 0.81) and plumbline testing (ICC = 0.83). Reliability ranged from moderate (ICC = 0.67) for spinal mouse to very high for surface topography (Cronbach α = 0.985). Measures of agreement ranged from 0.9 mm (plumbline) to 22.94 mm (infra-red motion-analysis). Variability in study populations, reporting parameters and statistics prevented a meta-analysis. Conclusions The reliability and validity of the non-radiographic methods of measuring global sagittal balance was reported within 14 identified articles. Based on this limited evidence, non-radiographic methods appear to have moderate to very high reliability and limited to three methodologies, moderate to high validity. The overall quality and methodological approaches of the included articles were highly variable. Further research should focus on the validity of non-radiographic methods with a greater adherence to reporting actual and clinically relevant measures of agreement.


Background
Progressive stooped posture, a common consequence of the ageing process, is associated with poor quality of life [1,2]. This posture, which can be described according to the vertical alignment of the trunk over the pelvis, is defined as global sagittal balance and is termed anterior sagittal balance when exceeding predetermined threshold values. Anterior sagittal balance is the postural deformity that is most closely correlated with pain, activity limitations and reduced quality of life [2] and affects up to 29% of the population above 60 years of age [3].
The current gold standard for measurement of global sagittal balance is the sagittal vertical axis (SVA) obtained via radiographs. SVA is quantified by measuring, in centimetres, the horizontal distance between the centre of the C7 vertebral body to the postero-superior border of the sacrum on full-length lateral spine radiographs [1]. This requires the use of spine-specific radiographic software [4] which demonstrates excellent intra-rater (ICC = 0.98) and inter-rater (ICC = 0.95) reliability and excellent accuracy between inter-rater tests (ISO reproducibility of 4.02 mm) [5]. SVA thresholds defining anterior sagittal balance range from 3 to 6 cm [6][7][8][9][10]. Alternate radiographic methods of sagittal spine balance measurement, which do not require spine specific radiographic software, include the angular measurements of T1 spinal inclination (T1Spi) and C7-S1 trunk inclination [11]. T1Spi has been reported to be more closely correlated to clinical outcomes evaluated by the Oswestry Disability Index, Short Form-12 and SRS-23 than SVA [11].
Recent advances in surgical and non-surgical spine management have revealed the importance of identifying, maintaining or restoring sagittal balance to achieve reduction in pain, improvement in function, quality of life and reduction in post-operative complications following spine surgery [11,12]. Physiotherapy treatment aimed at restoring sagittal balance, primarily by increasing lumbar lordosis, has likewise been demonstrated to improve clinical outcomes in patients with chronic lower back pain [13]. Therefore, the measurement of global sagittal balance is important for the development and monitoring of effective spine therapy interventions.
Although radiographs are the current gold standard, repeated radiographic exposure potentially increases lifetime risk for cancer development [13]. This is compounded when considering that lateral full spine radiographs can deliver an effective radiation dose that is 50-70% higher than standard posterior-anterior (PA) full spine radiographs [14]. Therefore, due to the high cost and radiation exposure, repeated radiographic measurement and monitoring of sagittal balance in the clinical setting have serious limitations [13]. Nonradiographic methods of measuring global sagittal balance are available and may present a viable option for monitoring patient progress. These methods vary with regard to technical complexity and equipment cost. However, the currently available methods and their psychometric properties have not been assessed systematically. Therefore, the aim of this systematic review was to evaluate the reliability and validity of non-radiographic methods of assessing global sagittal balance.

Protocol and registration
This review protocol was registered in August 2014 with the PROSPERO International prospective register of systematic reviews (ID PROSPERO 2014:CRD42014013071).

Data sources
Electronic database searches of MEDLINE, EMBASE, Web of Science, CINAHL and AMED were conducted from database inception until week 38, September 2016. The search terms were based on three main term groups: sagittal alignment, psychometric properties and physical tests. The Boolean term "OR" was used within each term group and the Boolean term "AND" was used between each term group. Additional hand searches of relevant bibliographies were completed (Appendix).

Eligibility criteria
Studies were included if they reported reliability and/or validity of non-radiographic methods of measuring standing global sagittal spine parameters in people with or without spine deformity or pain. All studies were considered regardless of publication date, age of participants or language.

Study selection
Two independent reviewers (LC, SK), after trialling a small pilot study, screened the titles and abstracts for eligible studies and reviewed the full texts of those identified. Full texts were retrieved if one reviewer determined that the record could not be excluded by title or abstract. In cases of disagreement, a third reviewer (EP) adjudicated. Bibliographies of included studies were searched for additional references.

Data extraction
In order to extract comprehensive methodological, population and psychometric data two independent reviewers (LC, SK) used a 13-item critical appraisal tool developed by Brink and Louw [15]. The Brink and Louw critical appraisal tool was developed from the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) and Quality Appraisal of Diagnostic Reliability Studies (QUAREL) to test combined or independent reliability and validation studies [16]. The data included a description of the study population and raters, detailed description of blinding, randomisation, between testing time periods, testing procedures, withdrawals and statistics methodology. Disagreement was resolved by consensus and, if necessary, in consultation with a third reviewer (EP). Authors of articles where the results or methodology were unclear were contacted for clarification.

Quality assessment
Methodological quality of individual studies was evaluated using the Brink and Louw critical appraisal tool and synthesised within the summary tables. Articles were considered high quality if they scored greater than the accepted 60% threshold on the Brink and Louw critical appraisal tool [16].

Studies included in the review
The database search strategy retrieved a total of 3940 records. After removal of duplicates, 2685 of the remaining citations were excluded as they did not meet the inclusion criteria. Following full text review of 114 articles, 14 articles met the inclusion criteria. The flow of articles through the review process is depicted in the PRISMA flow diagram (Fig. 1). We contacted the lead author of three included studies, a German language article for further information on methodology [19] and the lead authors of two other English language studies, to clarify reported units of measurement [20] and methods of measurement [21].

Global sagittal balance measurement methods
A total of 14 studies describing six global sagittal balance measurement methods were included in the review. Two studies measured construct validity, one by root mean square deviation [19] and one by ICC [21], two measured both construct validity and reliability [13,22] and 10 studies [20,[23][24][25][26][27][28][29][30][31] investigated reliability of the sagittal balance measurement methods. A description of each non-radiographic measurement method is provided in Table 1. Of the four studies reporting validity, three studies compared surface topography to radiographically measured angular trunk inclination [13,22] and radiographic SVA [19]. The fourth validity study compared plumbline and infrared (IR) motion analysis to radiographic SVA [21]. Nine studies examined inter-and intra-rater reliability [13, 19, 20, 22-25, 29, 31], and three studies examined test-retest time interval reliability [26][27][28]. Five studies evaluated the reliability for surface topography and two studies each for spinal mouse, plumbline testing and biophotogrammetry with one study for ultrasonic testing.
In terms of the outcome variables, trunk inclination was measured in four studies; two using spinal mouse [23,24] and two using surface topography [13,22] methodology. The distance from a plumbline reference line to the cervical or lumbar lordosis apex and the S1 landmark point was measured in four studies  off-lateral posture photographs, the distance from a plumbline to the lordotic and cervical apex [25] or C7, S1 prominences [30].
Digital camera with vertical plumbline reference posterior to the subject within field of view and a known (presized) object within field of view to establish distance scaling. Computer with graphic editing software Adhesive stickers that can be seen from the lateral margin of the body are placed on the C7 and S1 landmarks. After calibration, the distance from the plumbline to the landmark points are measured using graphic editing software. [25,30] Infra-red motion analysis Motion analysis computer-interfaced stereovideographic acquisition of infra-redactivated anatomical markers at C7 [21,26], T1 [28] and S1.
Minimum of three motion analysis cameras linked to a computer via an image processor. Infra-red light reflected on the adhesive markers Adhesive infra-red markers are affixed to C7/T1 and S1. The markers are activated by infra-red light and the dedicated computer system triangulates the spine data measuring the sagittal arrows. [21,26,28] Plumbline A ruler and plumbline to measure the distance to the C7 and L3 [29,31], or C7 and S1 [21] anatomical points on the body

Ruler and plumbline
The plumbline is held against or very near to the posterior surface of the skin. The distance from the plumbline to C7 and L3 or S1 is measured. [21,29,31] Spinal mouse Spinal mouse assessment uses a wireless computer-interfaced rollerball input device to determine the inclination of the spine from C7 to S1 and the vertical.
Spinal Mouse (Idiag, Voletswil, Switzerland) and computer The spinal mouse is rolled along the contour of the spine from C7 to S1 measuring distance of travel and angulation. [23,24] Surface topography Surface topography based on Moire stereovideography measures the distortion of a predicted light grid to create a 3D model of the back providing angular or distance offset data from the vertebral prominens (C7 or T1) to the midpoint between the PSIS.
Freepoint ultrasound system (GTCO Calcomp, Scottsdale, USA) and interfaced computer The freepoint probe is used to identify the T1 and S1 landmarks, which are triangulated and digitised allowing for computerised 3D reconstruction.

Quality assessment
The average quality of the 14 studies was 56% (range 44-77%) ( Table 2). One validity and reliability study [22], two validity studies [19,21] and three reliability studies [23,25,27] were of high quality, scoring > 60% on the critical appraisal tool. The main items with low scores were a suitable description of the raters (71% of studies unreported), within-rater blinding (77% of studies unreported), variation of testing order between raters (92% of studies unreported) and a suitable explanation of withdrawals from the study (92% of studies unreported).

Participants
Healthy adult participants were evaluated in five studies [20,24,27,28,30] and healthy children in one study [23]. Four studies evaluated participants with spine deformity or pain; three included adolescents [22,26,31] and one involved adults [13]. One study evaluated children, adolescents and adults with spine deformity [19], one study evaluated adults who demonstrated clinical manifestation of mouth breathing during childhood [25] and another study, adults with camptocormia [21]. Sample sizes for the validity studies ranged from 95 [19] to 326 [13] participants for the two surface topography studies and 49 participants for the plumbline and IR motion study [21]. Reliability study sample sizes ranged from two participants examined once by five raters (inter-rater) and 15 times by one rater (intra-rater) [13] to 180 participants examined by two raters (interrater) and then repeated after 5 min by one rater (intrarater) [29]. Only four studies included participants with a mean age greater than 30 years [13,21,24,30].

Validity and reliability Validity
Correlations between non-radiographic and radiographic methods of measuring global sagittal balance ranged from low to high (Table 3). Liljenquist et al. [19] compared surface topography sagittal trunk offset distance to radiographic SVA and reported a root mean square deviation (RMSD) of 1.07 cm. Legaye [13] compared surface topography trunk inclination to radiographically determined C7-S1 global sagittal axis and reported a moderate and significant correlation of r = 0.68 (p < 0.001). Knott et al. [22] compared surface topography sagittal trunk inclination to radiographically determined SVA inclination and reported a low Pearson correlation of 0.49. de Seze et al. [21] compared radiographic SVA to plumbline and IR motion analysis and reported high ICCs of 0.81 and 0.83 respectively.

Discussion
The aim of this systematic review was to identify, synthesise and summarise the reliability and validity of the non-radiographic global sagittal balance measurement methods. Several methods that vary widely in cost and technological complexity were identified, including plumbline testing, surface topography and IR motion analysis, which all had the most supporting evidence. Surface topography had low to moderate validity, very high reliability and high, but less than plumbline testing, accuracy. IR motion analysis had high validity and reliability with moderate accuracy. The overall quality rating of the studies was below the 60% threshold for a high rating, and they displayed a lack of homogeneity with regard to participants, reporting variables, and methods of measuring agreement.
The present systematic review noting that the plumbline method, which is the least technologically advanced and least expensive method, has high validity [21] and high reliability [29,31]. This suggests that the plumbline method, which is easily accessible to clinicians and requires little training, can provide quantifiable data and offer higher intra-rater reliability precision than the other methods. However, a note of caution is due here as de Seze et al.'s [21] validity results were obtained from a sample of Parkinson's disease patients exhibiting camptocormia (SVA 110 ± 11 mm), limiting generalisability to a different population.
Surface topography, unlike the other methods of measurement and with very little operator involvement, is able to provide, in one scan, the widest variety of sagittal balance measurements, including trunk inclination, distance offset measurements and sagittal arrows distance measurements. The reliability scores for inter-rater, intrarater, inter-day and intra-day testing, including one from a high-quality study [27] ranged from high to very high reliability (ICC 0.86-0.98). However, the validity scores ranged from moderate (Pearson's r of 0.68) in a lowquality study [13] to low (Pearson r of 0.49) in a highquality study [22]. There was little consistency with regard to reporting limits of agreement of surface topography to SVA with Liljenqvist et al. [19] reporting a distance offset RMSD of 1.07 cm and Knott [22] an angular average difference of ± 3.7°. This suggests a level of inaccuracy and further work to establish clinical limits of agreement is needed, given that radiographic SVA threshold ranges defining anterior sagittal balance are 3-5 cm [6][7][8][9]13].
Not only are our results confounded by the inconsistent selection of superior and inferior landmarks between our studies, and not all sagittal balance parameters can be measured with the same accuracy and reliability. Furthermore, the surrogate outcomes provided by nonradiographic measurement raises a question whether manually palpated surface landmarks accurately correlate with radiographic landmarks. Robinson et al. reported moderate inter-rater palpation agreement (67% within 10 mm) and moderate agreement with radiographically determined L5 (kappa 0.48) but poor agreement with radiographically determined C7 (kappa 0.18). [33]. Kilby et al. reported wide variability for manual palpation of ultrasonically identified lumbopelvic landmarks (Bland Altman limits of agreement -27 to 26 mm) concluding that manual palpation of lumbopelvic points has limited validity [34]. These validity results suggest that further research needs to be conducted to evaluate if radiographic methods of measuring global sagittal balance can be replaced with non-radiographic methods. This should be conducted with simultaneous nonradiographic evaluation of lumbar lordosis which appears to be, in conjunction with pelvic tilt, the main contributor to global sagittal balance [2,8,13].
The reliability of the lower cost and simpler, spinal mouse and biophotogrammetric methods, [16,32] has been investigated to a lesser extent than plumbline, IR and surface topography. The spinal mouse system, which involves a wirelessly connected trackball, measures global sagittal balance by trunk inclination. Although validity studies are available for spinal mouse determined sagittal and coronal spine parameters, with high to very high correlation with radiographically measured coronal frontal plane Cobb angle (ICC 0.87-0.96) [35], lordosis (r = 0.73) and kyphosis (r = 0.76) angles [36], none have evaluated the validity of trunk inclination. As the spinal mouse reliability studies included in the current review involved healthy adolescent and young populations, further studies, which involve older populations need to be undertaken. In a systematic review of non-radiographic measurement of thoracic kyphosis, Barrett et al. [16] also identified strong reliability for spinal mouse measurements. Barrett et al. concluded that the flexicurve was the most feasible non-radiographic method of measuring kyphosis, with high levels of reliability and validity; however, the flexicurve cannot be used for measurement of sagittal balance.
There remains considerable debate regarding the most appropriate method of measuring agreement within reliability and validity studies [37]. Only 30% of our studies reported Bland-Altman plots, and this is less than the 85% reported in Zaki et al.'s [37] systematic review of agreement within medical instrumentation testing methods. Zaki et al. cautioned researchers about utilising inappropriate methodologies to measure agreement because they are likely to result in incorrect conclusions and possible detrimental patient care. They recommended reporting results using multiple methods of measuring agreement. The limits of agreement should also be extrapolated into clinically meaningful limits which were not detailed in any of our included studies.

Strengths and limitations
Despite following the PRISMA guidelines, including all stages conducted by two independent reviewers, all languages and participants of any age, as with all such reviews, the possibility exists that not all the available articles were identified by the searches. We recognise that article quality may have been scored higher if the authors had adhered to the critical appraisal tool items but not reported on relevant items. We stress the importance of publication date, especially for the technology-based methods, since progressive technological evolution limits comparison of results and accuracy between and within advancing methods. There are also some limitations to be considered when interpreting our review. Due to significant variability in study methodologies, populations, reporting parameters and statistics, a quantitative metaanalysis could not be conducted.

Conclusion
Sagittal alignment, which is associated with increased pain and reduced quality of life, is an important concept emerging within the field of spine pain and deformity care. Non-radiographic methods of measuring global sagittal balance have low to very high reliability and, limited to plumbline testing, surface topography and IR motion, low to high validity. Thus, although it is currently unclear if these three methods can be used to evaluate sagittal balance pathology, they can be used with relative confidence for the monitoring of global sagittal balance. Further research needs be undertaken to establish the value of non-radiographic methods of measuring global sagittal balance. These future studies should ideally include the ageing population, adhere to best practice research methodology and psychometric agreement statistics reporting.