Testing for assumptions data dredging

8/19/2023

Check if the p-value for the coefficient of X is Create a random, normally distributed, variable X.the case where the true relationship between X and Y is represented by the equation: Y = 0.1*X + ε).įor different sample sizes repeat the following steps:

To prove this point, I ran another simulation to study 2 cases:įirst the case where the coefficient of X, the independent variable, is 0.1 (i.e. It would be highly unlikely that the predictor under consideration has exactly zero effect on the outcome, or else it would not be taken seriously in the first place.įor instance, if a variable X has a very small, practically non-significant effect on the outcome Y, you can still get a statistically significant result just by working with a large enough sample.You only need a VERY SMALL effect for this to work.Now you may be wondering: Where’s the trick if the predictor actually has a true effect on the outcome? So once you know that a predictor has some effect on the outcome, it is not hard to get a p < 0.05 if you can increase the size of your sample. (for more information about statistical power, see Statistical Power: What it is and Why it Matters) One way of improving the statistical power is to increase the size of your sample. The more statistical power you have, the more likely you’re going to detect an effect (assuming there is one). However, when there is a true effect, the probability of getting a p < 0.05 is… well it depends on the statistical power of your study. The histogram below shows the distribution of these p-values:Īs discussed above, when there is no effect, the probability of getting a p < 0.05 is 5% - This is the probability of having a false positive result. Create 2 NON-related, normally distributed, random variables.The simulation consists of repeating the following steps: This statement can be easily proven with this simple simulation: “If we test the relationship between 2 non-related variables, the probability of getting a p-value < 0.05 is 5%”. Multiple testing is based on the idea that: If you torture the data long enough, it will confess to anything. So let’s take a closer look at these “tricks” to see how each of them works: 1. Finally, present your findings as if you thought of the problem first and then came up with the solution.when the study hypothesis is guided by the data itself that will be used to prove it. Adding/removing other variables from the modelīelow we will discuss each of these points in details.īut first, note that none of these methods is inherently wrong, still they can be deceptive in some cases especially when combined with HARKing (Hypothesizing After the Results are Known) - i.e.Handling missing values in the way that benefits you the most.Here is a list of the top 7 tricks that can be used to get statistically significant p-values: Study results should always be interpreted in the context of: The objective of this article is to prove that getting a p-value below the threshold of 0.05 is not that hard, and that a statistically significant result proves nothing by itself.

0 Comments

Testing for assumptions data dredging

Leave a Reply.

Author

Archives

Categories