Given the body of experimental studies on gender bias in the evaluation of women in academia (e.g., Steinpreis, Anders, & Ritzke, 1999; Moss-Racusin et al., 2012), many expected implicit bias to be a major cause of women’s underrepresentation in math-intensive sciences (STEM). However, large-scale correlational studies have discovered no gender disparities in real-life hiring and manuscript and grant outcomes (Ceci & Williams, 2011). Why might this be so? This paper discusses methodological challenges that go beyond classic problems of external validity in extrapolating psychological effects and explanations to scientific communities. These problems include more complex external validity issues raised by the introduction of multi-process models of cognition (e.g., implicit versus explicit social cognition) as well as the reflexive role that folk and experimental theories of social psychology play in guiding the behavior of scientists at the individual and community level.