Scientists at Texas A&M University created the most challenging AI test to date, involving over 50 collaborators, to evaluate advanced language models on complex reasoning tasks. Initial results showed even top AI systems struggling significantly, highlighting gaps in current capabilities despite acing simpler benchmarks. The benchmark, detailed in a peer-reviewed paper, pushes boundaries in AI safety and robustness research. Future iterations will incorporate multimodal challenges to further stress-test emerging models.