Notices
Results 1 to 8 of 8

Thread: Scientific integrity - finding significance

  1. #1 Scientific integrity - finding significance 
    Forum Bachelors Degree
    Join Date
    Jul 2008
    Posts
    420
    Hello,

    I have a question about scientific honesty/integrity. A Post Doc with whom I was working recently told me that it is acceptable to omit certain data points, if they fall beyond two standard deviations from the mean. At the time, we were working with an N of 10 and she omitted one data point. The other nine fell within two SDs of the mean.

    Is this really a scientifically-acceptable thing to do? I cannot imagine that it is, because if you do that, then surely you are artificially reducing variation within your data? Surely it would end up looking like it fits a pattern, which may not exist in reality?

    Thanks,

    Tri.


    Reply With Quote  
     

  2.  
     

  3. #2 Re: Scientific integrity - finding significance 
    Moderator Moderator TheBiologista's Avatar
    Join Date
    Aug 2008
    Posts
    2,569
    Quote Originally Posted by tridimity
    Hello,

    I have a question about scientific honesty/integrity. A Post Doc with whom I was working recently told me that it is acceptable to omit certain data points, if they fall beyond two standard deviations from the mean. At the time, we were working with an N of 10 and she omitted one data point. The other nine fell within two SDs of the mean.

    Is this really a scientifically-acceptable thing to do? I cannot imagine that it is, because if you do that, then surely you are artificially reducing variation within your data? Surely it would end up looking like it fits a pattern, which may not exist in reality?

    Thanks,

    Tri.
    There are loads of statistical methods for ruling out 'outliers', and loads of moral/philosophical justifications too. It's a divisive issue on both fronts. I wish I could say I've never dismissed a data point because it was clearly (in my view) the result of a bad measurement, but I regret that I have. Typically when I've seen other people do it, the want to remove the data point has come first and the rationale is built to serve that want. In my view, it is not good practice to exclude a data point for any reason.

    In principle, if you have enough well-measured data, reproduced enough times, the outliers should clearly appear as the noise that they are and their impact on your analysis should be minimal. In practice, most scientists will tend to outright discard a certain number of results (due to experiments that went wrong, a generally 'noisy' protocol, or whatever) or will set thresholds for signal detection that follow a similar principle as your post doc's 2sd rule. In my view, the most essential things are consistency and transparency. The method for exclusion should be the same every time, and it should be included in any reporting of the data, including publications.

    The above sounds quite conflicted, but I hope it has helped somewhat.


    Reply With Quote  
     

  4. #3  
    Forum Bachelors Degree
    Join Date
    Jul 2008
    Posts
    420
    Thank you
    Reply With Quote  
     

  5. #4  
    Universal Mind John Galt's Avatar
    Join Date
    Jul 2005
    Posts
    14,169
    Coincidentally I am about to begin teaching a two day class on data analysis for drilling performance data in the oil and gas industry. The advice I give in this area echoes The Biologista:
    1. Outliers are typically obvious. (One could use a specific statistical method, but with the data sets we are typically working with the oddball numbers will stand out.)
    2. Determine why those outliers are there. What made them different?
    3. If that difference is caused by an independent variable that is a subject of your analysis, include the data point.
    4. If it is not, then exclude it, but make it clear that you have excluded it and your reason for it.

    Arbitrary exclusion is ultimately going to conceal important information and force the data to fit your preconceived relationships - in which case the analysis becomes nothing more than an expression of your opinion using quasi-numbers.
    Reply With Quote  
     

  6. #5  
    WYSIWYG Moderator marnixR's Avatar
    Join Date
    Apr 2007
    Location
    Cardiff, Wales
    Posts
    5,760
    be careful of influential points though, especially when a small number of points lie far away from the bulk of data - they may give the impression of a trend that just isn't there
    "Reality is that which, when you stop believing in it, doesn't go away." (Philip K. Dick)
    Reply With Quote  
     

  7. #6  
    Forum Freshman
    Join Date
    Nov 2007
    Posts
    16
    Forgetting the science for the moment, and let's talk about the ethics. There is no definite answer in regards to ethics, it is completely personal. This method of measurement is largely accepted by the scientific community, but if you have a personal dilemma about it, then you should learn from the situation and follow your own ethics in the future during your own research.
    After walking through the streets of this world, I swear I will never eat off my shoes again. A quirky scientist like us all www.uncomplicatedscientist.com
    Reply With Quote  
     

  8. #7 Re: Scientific integrity - finding significance 
    Suspended
    Join Date
    Aug 2010
    Location
    Fort Lee, NJ, USA
    Posts
    153
    Quote Originally Posted by tridimity
    Hello,

    I have a question about scientific honesty/integrity. A Post Doc with whom I was working recently told me that it is acceptable to omit certain data points, if they fall beyond two standard deviations from the mean. At the time, we were working with an N of 10 and she omitted one data point. The other nine fell within two SDs of the mean.

    Is this really a scientifically-acceptable thing to do? I cannot imagine that it is, because if you do that, then surely you are artificially reducing variation within your data? Surely it would end up looking like it fits a pattern, which may not exist in reality?

    Thanks,

    Tri.
    1) Throwing away the outliers reduces the chance of discovering something new and unexpected. That is what I was told, when I was a postdoc at Columbia University, about 50 years ago.

    2) You might be interested in what I just posted as

    http://www.thescienceforum.com/viewt...540&highlight=

    Ludwik
    .
    Reply With Quote  
     

  9. #8  
    Forum Radioactive Isotope skeptic's Avatar
    Join Date
    Nov 2008
    Location
    New Zealand
    Posts
    4,843
    A much greater problem is those who use the outliers to try to prove a point.

    This also works with whole studies. For example, in the field of medicine, sometimes a new drug or other treatment will be the subject of literally dozens of studies. Due to random chance, a few studies will sometimes come up with results completely at variance to the majority of studies. An unethical person may use those outlier results in isolation to 'prove' a point, and thereby mislead people. Very unethical!
    Reply With Quote  
     

Bookmarks
Bookmarks
Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •