Maharishi University of Management

Degree programs in the arts, sciences, business, and humanities

Bookmark and Share

Institute for Natural Medicine and Prevention

Review of “Effectiveness of Meditation in Healthcare”

Harald Walach

This is an ambitious review that tries to identify all literature that has been published regarding meditation and meditation research. As such it is a highly commendable effort that certainly needs to be published at some time and should also be done with as much accuracy and methodological rigour as possible. Before this review can go into print and publication I think it needs some serious revision and editing, which I’m going to outline below. As I have had only one week for compiling this review it was not possible to do it with all the scrutiny and rigour that would have been necessary. However, I think that the points I’m going to make will be showing that the review needs to be edited.

I.

What is completely missing from the review is a detailed description of the statistical methods used and the statistical models used to do the computations. The two references that are mentioned, taken out of the book by Egger et al. describing review and metaanalysis methodology are a) by no means sufficient to justify the methods used in this review, b) are actually very short (one of the papers is only 5 pages long and it is unlikely that this is sufficient information for the reader to actually see how the meta-analytic methods were done). To this reviewer’s knowledge the statistical methods are not at all up to date, and a lot of discussion around how to compile metaanalyses out of single studies, or to combine effect sizes across studies, is not even mentioned, let alone referenced. For instance, the Handbook of Research Synthesis, which is one of the primary texts in the behavioural science literature does not seem to have been taken into account. Otherwise a lot of the review would have been conducted differently. I recommend to thoroughly revise the general statistical methods, do the relevant background reading, and then go over the statistical methods again.

II.

The way effect sizes have been handled does not seem to be straightforward. For continuous measures, sometimes standardised mean differences, sometimes weighted mean differences are used, and it is not justified or mentioned why this is done, and why sometimes one and sometimes the other effect size measure is used. Also, the relevant computations are not mentioned or referenced. One would expect this in the appendix at least. It would be a standard procedure in the behavioural science literature to use standardised mean differences, for instance using a conservative estimate such as Hedges’ g and the appropriate method of compiling these effect sizes, indicating the precise models and the reasons for using them.

III.

The authors have done a magnificent job to really compile a lot of the literature, and they also acknowledge the lack of non-English literature. I think this is justified and would not pose a problem by itself. However, they claim to have eliminated multiple usage of studies. But with a single in-depth check I performed I found several instances of multiple usage of either studies that have been used twice or effect size calculations from studies that were not actually intending to alter the measure in question.

IV.

From the in-depth checks I have made in only two instances I fear that there has been a problem with coding studies and with the nitty-gritty of metaanalysis that is actually the most important part of it. While I have no doubt that the calculations and all the statistical procedures that were employed have been employed properly through the programmes mentioned, I have severe doubts that the actual data extraction and the interpolation of effect sizes or standard deviations that are necessary for a good quality metaanalysis have been done correctly. I therefore think what is needed before such a contentious review and health technology assessment report can be made public and can be the basis for grave decisions by health authorities there needs to be a thorough and complete check of primary data against the database that has been used for compiling the final statistics. Until this is done I recommend to not publish the report. It might do damage to the reputation of the researchers involved, for if anybody is going to check the primary data and will find as many mistakes as I have, then the review is certainly compromised. An alternative to this second checking of primary data against the database might be to produce an audit trail or a monitoring trail of the procedures involved. However I doubt that such a trail is available, judging from the in-depth specimen that I have taken.

V.

This in-depth specimen refers to two accidentally selected metaanalyses which I have used for checking the procedures. These choices were mainly based on intuitive reasons and on the availability of primary material. The first example I’m going to go into is a metaanalysis of Transcendental Meditation versus health education and the effects on blood pressure, which is detailed on pp. 102 & 103 (table of studies), and p. 107, the metaanalysis itself. The second example is the meta-analysis of the Relaxation Response on blood pressure. Let’s first turn to the table, pp. 102 & 103.

In the second line a study by Calderon (2000) is mentioned as a study of Transcendental Meditation affecting hypertension. I was not able to retrieve this study, as it is a doctoral dissertation which is unpublished. However, from the title it is obvious that it is a study that is targeting lipids in borderline hypertensives, and it is likely that this study uses blood pressure only as a secondary outcome. I wrote an email to one of the lead researchers, Dr. Schneider, who provided me within one day with: a) the confirmation of my suspicion that blood pressure was not a target, and b) with a peer-reviewed publication of that study that would have been available (Calderon, R., Jr., Schneider, R. H., Alexander, C. N., Myers, H. F., Nidich, S. I., & Haney, C. (1999). Stress, stress reduction and hypercholesterolemia in African Americans: a review. Ethnicity & Disease, 9(3), 451-462.) This shows that although the formal and technical side of the background research was done thoroughly, the informal side, such as talking to lead researchers, tracking grey literature through contacts, was probably not performed with the same of professionality.

The next line down, Castillo-Richmond (2000), is actually available, and it is clear from the paper that this is a study that is targeting primarily atherosclerosis in hypertensives, and that this is a subset of data of another study, very likely the big trial published by Schneider in 2005, but it is not clear which one. This impression was confirmed by the lead author. However, what is clear from the publication is that this trial did not target hypertension but thickness of the artery as a measure of coronary heart disease. Hypertension was used as a secondary outcome, and I don’t think it is appropriate to mix studies that have used secondary and primary outcomes just because it is the same outcome measure. Also it should have been investigated which larger study this study was part of, and if it is really an independent study, which it is definitely not, it should have been mentioned under a different heading and not under “Hypertension”.

On p. 103, 3rd line down, where the Schneider (1995) studies are referenced, there is one reference which is indicated as referring to this study. This is in fact a publication from 2001 describing the protocol of a new study, very likely the protocol of the study that was published in 2005, in the line below. Hence, what has likely happened is that the reference 70 has just gone out one line and should have been mentioned at the reference Schneider (2005).

In the column giving information about allocation concealment, these two studies, Schneider (1995 & 2005), are mentioned as unclear allocation concealment. I think that this is wrongly coded. Hence, to denote this as a unclear allocation concealment is wrong. Only a cursory knowledge of trial methodology should be sufficient to arrive at this conclusion. The fact that this coding has slipped in, apparently multiple times, shows that the data extraction was done by researchers who do not have adequate competence for their job. For example, regarding the Schneider (2005) publication, here it was mentioned in the publication that the allocation was done by a computer algorithm. According to all conventions in published trials, allocation by computer algorithm is regarded to be safe and concealed, because nobody can know which group the next patient is going to be allocated to. This is the definition of allocation concealment, and hence it should have been mentioned as fully concealed allocation concealment. As allocation concealment is one of the important items in the 5-item Jadad scale – we are going to discuss this later on – such a mistake is grave as it leads to a wrong assessment of methodological quality of the trials included. As I have only checked parts of one table and have found about 50% of those data I have checked as wrong, this does not inspire confidence in the rest of the coding.

VI.

Now let’s turn to the metaanalysis proper described on p. 107. One minor detail, which applies to all the reporting of metaanalyses here, is that while in the whole report a numerical superscript referencing system is used to refer to studies, in the metaanalyses itself it is the author-date convention that is used, which makes it extremely difficult to actually follow which studies have been used in the primary analysis and necessitates that the reader refers back to the other tables. I would recommend to change this, or at least supplement the author-date system with the numerical referencing system so that the reader can actually check which studies are meant.

Looking further, at figure 3 on p. 107 several inconsistencies, numerical typing mistakes and problematic decisions are visible:

  1. Schneider (1995): although the data entered for the TM group are correct according to the result table of the original publication, the authors also reported adjusted means. Here the non-adjusted means have been entered. While this is not a very serious mistake it shows that decisions have to be made in a stepwise procedure, and it is by no means clear to the outside observer why non-adjusted means have been chosen and not adjusted ones, while adjusted means are normally to be taken to be the more robust and sensitive estimators of an effect.
  2. The next study, Kondwani (1998) is actually available as a peer-reviewed publication (Kondwani et al. (2005) Left Ventricular Mass Regression with the Transcendental Meditation Technique and a Health Education Program in Hypertensive African Americans. Journal of Social Behavior and Personality, 17, 181–200). Blood pressure was not the target of the study, and also, this reference was not found by the search strategy. A brief contact with the lead researcher, as I did, would have been sufficient to secure the information. The fact that it had not been done, shows that some ways of actually getting all literature have not been used.
  3. According to Dr Schneider, the next study, Calderon (2000), was a pilot project of lipid lowering. Even though blood pressure was measured it certainly was measured as a secondary outcome in this study.
  4. The study by Castillo-Richmond (2000) was in fact a study on reduction of carotid atherosclerosis as measured by reduction of the thickness of the intima media, and blood pressure was only a secondary outcome.
  5. The study by Schneider (2005) contains wrong data. The reduction of blood pressure in the TM group was not 2.13 but 3.12, and the standard deviation was not 13.82 but 11.17. Also the reduction in the control group was 0.9 and not 1.08, and the standard deviation was 11.43 and not 13.86. It appears that here in this line a mistake in transcribing data either at the data entry stage or at the extraction stage has been made. According to one of the researchers at the Maharishi University of Management, Dr. Schneider, a series of other studies had been done that were measuring blood pressure, which have not been included in this particular metaanalysis. Some of them do not seem to have been located and are not contained in the list of rejected or included articles. This leaves one with dubious feelings about a lot of the elements of this metaanalysis. If the metaanalysis reported in fig. 3 was meant to be inclusive of all studies using blood pressure as outcome, no matter whether it was primary or secondary outcome, then it seems to have missed a lot of studies that are relevant to that question. If it is only about clinical relevance of TM in clinically ill hypertensives then studies are included that do not seem to target high blood pressure in the first place, and the question arises why they have been included. Finally, the fact that studies that have been published in the peer-reviewed literature and are readily available do not seem to be included in that analysis questions the thoroughness of the literature screening and the criteria for inclusion of studies into particular metaanalyses.
  6. The conclusion from that detailed checking of one particular metaanalysis reveals that at some point the work has not been checked thoroughly. Numerical data errors as the one spotted in this analysis should not occur and should have been spotted by a monitoring or checking process. The fact that the inclusion of studies in metaanalyses is not clear from the outside observer testifies to the fact that the protocol or the target of single metaanalyses does not seem to have been clear at the outset. This raises the question whether there is a robust log of decisions made during the process of these metaanalyses and reviews. Ideally one would like to see a predefined protocol which should have been appended to the material so that an external reader and observer can check what predefined steps had been taken. Also a metaanalysis of that kind depends highly on ad hoc decisions which have to be made, such as how standard deviations are derived if they are not reported, or which of two measures is used if several are reported, etc. From just looking at the metaanalysis and checking against the primary data these decisions were not always clear to this reviewer in this particular analysis. This does not inspire trust and confidence in the robustness and validity of the other metaanalyses.
  7. I checked a second analysis, at random, the one about the Relaxation Response in high blood pressure on p. 115, Figure 17. Immediately, some problems become visible: The standard deviations of the measures that are actually not reported in the original papers are estimated as 15 for the systole and 13 fore the diastole. On what ground? No reason or algorithm for that decision is given (although it might be a wise one). In the Surwit-study, only the systole is used, although also the diastole was reported. Why? The Hager-study actually offers several time points. One was only used. It is unclear why. Normally there would be a rationale given, such as “only conservative measures”, or “always measures favourable to the study”, or whatever, would be used. Nothing the like is indicated here.
  8. It was not possible to check all the analyses in detail since this would have been a time-consuming exercise that would warrant several days of intensive research. However, from that brief check I cannot recommend publication of this document as it stands. What is necessary is a thorough checking of each step of the metaanalysis right from the point of data extraction into the build-up of the database against the paper trail, if available, of how these decisions were made and why certain studies were included or excluded. Only if this counterchecking and monitoring has been documented to an external monitor or reviewer, such that mistakes are clarified and decisions are transparent and valid, should these metaanalyses go forward.
  9. On a much broader basis, this analysis also questions the analytical procedure as a whole. Although there is a lot of debate about what metaanalyses should do and how metaanalyses should be implemented, and although a valid point can be made for the decision and the approach used by the researchers, it seems that one analytical decision went unquestioned: this decision seems to have been to take every single outcome independent from the study in which it was reported and independent of the context this outcome had within one study and meta-analyse it separately, together with other studies. This leads to the mix-up of studies which specifically targeted one outcome and studies that just reported that outcome as an additional one. While this strategy may be useful in the case that outcomes were always well-chosen and targeted, it may lead to confusion in the case of a situation where an outcome was just measured as an exploratory device, which happens frequently in the behavioural literature. If such outcomes are then combined with studies where a certain outcome had been targeted explicitly, it is bound to dilute a potential effect, and the summary statistics is bound to be invalid. Hence a more sensible and also fairer analytical and evaluation strategy would have been to meta-analyse on the basis of single studies. That means that for each single study either a single effect size would have to be derived based on the primary outcome or on an aggregated effect size for all outcome measures in a study, depending on how the study was actually conceived and what the primary aim of the study was. Depending on that decision, outcomes of a single study could have been averaged, combined by weighted averages, or just by using the primary outcome. By additionally coding for relevant variables such as diagnoses or design characteristics it would have been possible to decide which variables are really moderators in a meta-analytic approach. I’m quite aware that this is a completely different approach, but what I do not see in this document is a robust, cohesive, and convincing argument for the strategy adopted here. Moreover, when looking into the detailed analyses the whole analytical strategy looks rather haphazard and ad hoc. I suggest that the analytical strategy is completely revised and fully argued for before it is published, such that the reader can make up his or her mind as to whether the analytical strategy was appropriate. My suggestion would be that for the quantitative metaanalyses the study is chosen as the unit of analysis and not the outcome. Then from there studies can be combined that are targeting the same outcome, or all studies can be combined that are researching a particular meditative technique, and from such global omnibus analyses sensitivity analyses can be derived which can clarify the expected heterogeneity, or meta-regressions can be applied that clear up some of the variance to be expected and then clarify some of the important moderators, which was not really possible with the strategy employed here.
  10. The authors make a point that meditation is a heterogeneous construct, and they do a very good job in describing certain types of meditations and in describing both their commonalities and differences. When reading this passage it becomes clear that there is actually quite some commonality in all the meditative techniques employed, based on the wilful direction of intention, the methodological approach, and the use of techniques to modify attention and states of consciousness. This is also the outcome of the Delphi exercise which the authors employed and which is a highly commendable enterprise. Hence it would actually make sense to do a global omnibus analysis on all studies to find out whether there is a quantifiable effect of meditation on all outcomes checked. This is similar to what has been done in the psychotherapy literature when the question was asked whether psychotherapy has any effect at all. This can then be qualified by differentiating the effect into meditation groups or patient groups or according to designs. I recognise that this is a completely different strategy and that the authors may have had good reasons to not follow such a strategy. However, I think what is necessary in a document like this is a cohesive argument for the analytical strategy chosen, and this is exactly what is missing.
VII.

This leads me to the next point, which is actually the background assumptions underlying the metaanalyses and the review. They seem to be relatively close to the medical model, assuming there is a causally active intervention for a defined disease entity. While this may be true for very targeted pharmacological interventions, although it may be even doubted for that, it is certainly not a particularly useful assumption for assessing the effectiveness of behavioural interventions that are quite holistic, such as meditation approaches. In this type of research what is normally done is to test for a generalised effect by using a particular measure for a particular population. But as testified by most studies, they use a series of domains of measures to find out about effects, and hence my argument that a more sensible analytical strategy would have been to use the study as the unit of analysis and differentiate from there. We have done that in our metaanalysis of mindfulness meditation by differentiating between psychological and physical domains and have actually found quite a robust sizeable effect of d=0.53 with not much heterogeneity. This has been replicated by an independent group, and I would be very surprised if this compilation of the literature would come to a different conclusion. My guess is that part of the heterogeneity isolated by the authors of this review is due to the inappropriate combination of studies that used outcomes as primary and secondary outcomes or had different patient populations. My hunch is that by using single studies as the unit of analysis and combining effect sizes first on the level of study and then from there across studies, and potentially across disciplines, will yield a much clearer and much easier to interpret picture of the literature.

VIII.

I do find it difficult to follow the quality assessment, used to assess quality of the studies. The Jadad score used is already under very severe criticism even in the conventional metaanalysis and clinical trials literature, because everybody recognises that although it is “validated” it is not very useful because of its very restricted range.

The Jadad score places a lot of emphasis on some important indicators of internal validity, which are actually very easy to see and assess and leaves out a lot of other indicators of study quality that are comparatively difficult to assess and that are equally important if not more important than some of the quality indicators it uses. That is the reason why a lot of competent review groups do not use the Jadad score, or if they use it they use it very cautiously. Let’s take one single example.

The Jadad score places big emphasis on double blinding. While this is completely justified for pharmacological interventions or for interventions where blinding is actually possible, it is not at all sensible to employ as a criterion in behavioural interventions where blinding is not only difficult but counterproductive. There is quite some consensus in the behavioural research literature that blinding of behavioural interventions is only useful if it can be done without altering the context in which the treatment is delivered, i.e. without compromising external validity. But this is normally not the case because if patients are blinded as to the treatment delivered then deception has to be employed, and deception is counterproductive in therapeutic relationships.

Hence, blinding of patients does not play an important role in the behavioural research literature. Instead, what people are trying to achieve is to place emphasis on unbiased outcome measures by either using blinded observers, using outcome measures that are objective, by triangulating data from several perspectives of measurement or by using measures that are comparatively robust against bias. None of these procedures would have shown up in the Jadad score, and if a study had gone to great length of trying to achieve these ways of arriving at unbiased estimates of effect sizes, this would have been less valuable in the scoring system used here than the simple mentioning of double blinding whether it was useful, sensible, possible, or in fact true.

As my quick analysis of table 24 has shown, the rating of another item which is important in the Jadad scale, allocation concealment, was wrong in several instances anyway. This is due to the fact that allocation concealment, although very important, is normally inferred indirectly. If an algorithm has been used that does not allow inference as to which group a patient is going to be assigned to, this is by definition allocation concealment. This consensus is actually used by most writers who are hard-pressed for space, so they do not repeat obvious and clearly visible information. However, in the way this has been used in this review, it looks as if this point has only been coded if it has been explicitly mentioned. This is actually supporting bad medical writing, because it is looking only at information that has been verbally reported and not inferred from the text. A good medical writer is trying to not repeat or explicitly mention stuff that can be gleaned from other parts of the information, but tries to save his space for more important parts of information. That way the Jadad score and the way it has been used in this review and metaanalysis penalises good writing and potentially good studies and supports cookbook writing that mentions catchwords. I do not think that this is an appropriate way of dealing with the complex issue of study quality.

IX.

There has been some consensus in the behavioural metaanalysis literature that the assessment of validity of studies is a highly complex issue and that the richer the assessment the more likely it is that important modifiers of effect size heterogeneity can be identified. A five-item scales such as the Jadad score is certainly not going to do the job. Even in the clinical trials literature there are more sophisticated measures around, such as the Detsky score or other scores. Also the Cochrane criteria are much more sophisticated than the Jadad score. Using this scale just because it is validated is actually a refusal to employ one’s own critical thinking. Most importantly, what is left out in this scale is external validity. If validity is assessed at all then it is not appropriate to just assess internal validity. It is possible to have a Jadad score of 5 with a completely invalid and irrelevant clinical trial. If someone is able to report the items properly it might be the case that a small pilot trial ends up with a high Jadad score and a complicated final trial that gives us much more information ends up with a very low Jadad score although it might have been methodologically much better. Other indicators of internal validity, such as whether the study was actually conducted well, implemented well, whether it had a realistic and appropriate power analysis, was based on previous pilot data or was just a one-shot study – all these elements are not covered by the score. A much more important point of criticism is the model validity. At no point in the analysis and at no point in the Jadad score is the appropriateness of the treatment, the integrity of the treatment delivery, the dosage of the behavioural treatment, etc. mentioned. These are very important indicators as to whether a study was implemented well and whether it is meaningful.

As an example: a study might have been completed blinded, allocation concealed, and reported intention to treat without loss of follow-up, but the meditation might have been implemented so badly that most of the patients left early and did not actually finish the course, while the control patients might have finished part of it. Such a study would be able to achieve a high methodological score without having any meaning whatsoever because the treatment in question was not actually validly tested.

These are very simple and very quick arguments regarding a very complex issue, namely the issue of validity and quality testing of trials. I find it utterly inappropriate to use a simplistic scale such as the Jadad scale to make decisions and judgements on methodological quality of complex interventions around any type of meditation research. If quality judgements are being made within that context, these judgements should be based on complex assessments both of internal validity and external validity alike. It may be that in order to assess external validity properly a scale would have to be developed first because to my knowledge there is not a good scale assessing external validity of meditation research. But such are the vagaries of research synthesis that sometimes the methods have to be developed before they can be used appropriately. It is always a bad idea to just use methods that are available and not think further whether the methods available are also the ones necessary and appropriate. In this context I feel it is only appropriate to use any judgement on study quality if it is also combined with a measure of external validity, and if the measure of internal validity is more complex and depicts a situation more truthfully than the Jadad score is able to do. Otherwise I think the researchers should not say anything about methodological quality because a judgement based on the Jadad score only is wrong. From my personal knowledge of some of the clinical trials in this area I am quite sure that they were conducted with at least as much rigour and methodological sophistication as some of the pharmacological trials that would reap Jadad scores of 5.

X.

This leads to a very general criticism of this metaanalysis. The way all these issues have been handled suggests to the unbiased outside observer that there was a lack of discussion, argument and thorough preparation before that report was started. An issue like effectiveness of meditation in general is highly complex and warrants a very thorough analysis of the methods before actually embarking on the enterprise. The way how the results have been produced, the methods that have been employed, and the justifications that have been given for the employment of these methods all fall short of the high standards that are required by both the public and researchers in order to base decisions. My general recommendation therefore is to not publish this review at all before all these issues have been resolved, or properly argued for, or amended. I suggest that an external monitor or review board helps the researchers in checking these. I understand from reading the report that there was such a panel in place, but it is actually not clear from reading the report that the panel had much input into the methodology of review and the metaanalysis. Rather, the panel seems to have been employed mainly for the resolution of conceptual issues.

Although the peer review process that has been started as a quality assurance process for the final product is a very commendable step in the whole issue, I think a similar process is necessary to assure the integrity of the methodology, of the data handling and synthesis processes, and the compilation of the final report before it can go into peer review. One way of achieving this would be to convene a panel of experts either via phone conference or e-mail conference to discuss the steps and issues I have raised. As most of the information is probably available already, this would be a post hoc quality assurance policy that would at least improve the quality of the data available and the way how it is reported.

XI.

In what follows, I mention some minor issues for improvement of the document as it stands and some other points for discussion or improvement. These I would see as rather minor and discretionary.

  1. P. v: in the abstract result section it has to read “yoga help to reduce stress”. Final line: “execution of studies”.
  2. The rendering of the diverse meditation techniques from p. 29 onwards is in general quite correct. However, I have some problems with different emphases given. While a tradition such as Vipassana or Zen meditation is mentioned on two or so pages, yoga is treated much more in-depth on five pages. This to the outside reader conveys the impression that yoga is more complex than Vipassana or Zen meditation, whereas it is just a matter of how deeply the issue is analysed and reported. For instance, in Zen meditation there are different breathing techniques and there are different ways of doing meditation. One could distinguish between the Soto and the Rinzai school with the Soto school mainly doing shikantaza and the Rinzai school mainly doing koan training. However, koan training involves meditation on a wato, which is similar to a mantra. Finally, some schools of Zen meditation do not at all attempt to alter breathing or have certain breathing techniques employed. On similar lines, one could enlarge the information about Vipassana meditation by looking at techniques such as tagging or reporting or naming mental events, etc. I suggest that it would be appropriate to balance these issues more. Also it is not true that Vipassana or Zen meditation are strictly seating meditations. Most of these practices also have certain formalised ways of movement, such as kin hin, the walking meditation in Zen.
  3. On p. 49 the issue of attention and its object is also more complicated because it is not true that this distinction between focus of attention on objects versus mindfulness can be clearly divided. Some traditions, such as Zen, have phases where concentration is used, and for that certain elements, such as counting or concentration on a wato is employed, while at other stages broad spaced mindful attention is encouraged. And the same is true for Vipassana meditation.
  4. I find the treatment of spirituality and belief very superficial. Spirituality normally refers to a spiritual practice that encourages the way we relate to the world as a larger whole being part of ourselves, while a belief system refers to a cognitive metastructure that is derived out of spiritual experiences.
  5. P. 78: prison inmates are named under “healthy populations”. This could be debated vis-à-vis the fact that in some studies more than half of prison inmates are diagnosed with some serious mental disorder.
  6. Pp. 142 & 143: substance abuse is discussed. However, table 32 contains only information about blood pressure studies, dietary intake and lipid lowering mainly, and no substance abuse is discussed in that table. Might it be that some information have been missed here?
  7. Reference section: the referencing is inconsistent. Sometimes the full publication year is mentioned, sometimes only the last two figures. I would recommend that always the full publication year is mentioned, otherwise publication year and publication volume might be mixed up. In a lot of the cases the author information is either wrong or insufficient. While the first author is probably always correct, the second authors are sometimes squashed into a series of capital letters, such as in references 46, 48, 49, 335, 342, 357, 358, etc. It is necessary to check the consistency and integrity of the reference section. Also I would recommend ordering the reference section a little bit more reader-friendly either by using a numerical and alphabetical system, so that references can be found easily. At the beginning of the referencing, the system is in sequence, and then it starts to become numerical. Also it might make sense to not start the counting of the rejected studies anew, otherwise readers might get mixed up if they by chance hit the wrong reference section.
  8. I think the appendix should contain the study protocol that was used prospectively, an information about the most important general decisions, for instance, what was the rule if standard deviation was not reported? Was the most conservative estimate that was empirically available used, was an approximate estimate derived from the statistics? Etc. Also what I think would be necessary is a table containing the most important statistical data for each study, such as the main outcomes extracted, the effect size measures used, etc., so that an interested reader could use the information for a replication of the metaanalysis and the separate analyses done.

I am sorry if this review is rather critical of some decisive elements of the report, and if I have not done justice to the work of the researchers I apologise. However, from the brief checks I was able to make in the very short time that was available to me the results did not inspire enough trust to recommend publication of the review as it stands. I hope I have made clear why this is so, and I can assure the researchers that my points have been raised both in good faith and in a full cooperative mood, in order to protect the reputation of the researchers, the institution and the funding source. Thus, if further help is required or more input is wanted I am happy to help should that be useful.

Prof. Harald Walach
University of Northampton
School of Social Sciences &
Samueli Institute for Information Biology
Boughton Green Road
Northampton NN2 7AL
Tel +44-1604-89 2952
Fax+44-1604-722067
Email: harald.walach@northampton.ac.uk

back to top

The University is accredited by The Higher Learning Commission • www.ncahlc.org
Maharishi University of Management • Fairfield, Iowa 52557 • (641) 472-7000
Office of Admissions: (800) 369-6480 or (641) 472-1110
Copyright and Service Mark NoticeDisclosures and Consumer InformationContact the webmaster