Anatomy of a moral panic
A year ago, after noticing an alarming shift in both the number and tone of news articles being served by search engines like Google and Bing, I cobbled together a system that would log news stories as they were added to Google News in real time. The project garnered a decent amount of attention, and led to a representative reaching out to check my claims that Google was serving extremist content. We corresponded for about a week as I shared my methodology and code (all of which is accessible in Github here).
We came to an impasse as the Google representative concluded that this couldn’t possibly be showing a complete picture of the news stories in the index, or alternately, that it too completely captured the index that it didn’t appropriately translate into the user experience, which calls into question the utility of the data as a behavioral influence. I was, however, assured that there would be some investigation into the fact that groups which had been formally classified by the Southern Poverty Law Center’s project on extremism as hate groups were being served as legitimate news outlets. (This has sadly not come to fruition.) Like a lot of projects, I threw myself into this with a fervor during the evening and then as time wore on, I left it running quietly in the background and mostly forgot about it.
Last night, I reran the code and aggregated a full year’s worth of data which is available for free to researchers here.
When I found ninety-seven thousand observations in the dataset, I was stunned.
But first, let me orient you to the data, from last year’s blog post:
I set several Google News alerts to three keywords, each chosen for their relative frequency in articles with a given valence. While up until recently “transgender” was a positive or neutral term favored by center and liberal news outlets, terms like “biological sex” have been favored by the UK media attempting to focus on major arguments like prisons and bathrooms. A third keyword “gender identity” was chosen as being neutrally valanced — although this too appears to be changing.
Each google alert feeds directly to RSS, split between three regional specifiers: UK, US and “any region”. The RSS feeds are monitored by IFTTT, which adds each new entry to a google sheet. The sheets are then parsed using a private OAuth token and read into R as a single dataframe, with graphics rendered by ggplot.
About those 97k observations: a quick query restricting to only observations with unique URL strings in the R console returns 55.5k results.
This means that well over fifty-five thousand individual, unique articles about transgender people were recorded during the year of observation.
While this doesn’t strip out all duplicates because many news outlets embed unique tracking identifiers at the end of their URLs which lead to the same URL being recorded more than once with a slightly different string on the end, I think we can lay to rest Google’s claim that the data collected by the scraper is only a sample of the available content.
> length(unique(news$EntryURL)) [1] 55563
In order to check our work, we can take advantage of my favorite feature of any search engine. Google allows advanced search operators specific to dates. They allow any user to restrict the results of a query to those which were added to Google’s index within a specified time period. This allows us to get a rough estimate of the scale of the available data in an easily replicable result. Below are the same search terms for roughly the window the data was collected (this appears to vary based in part upon time zone, if articles are published with advanced timestamps, etc) across six chrome profiles, each associated with a different account.
You can see that the mean number of articles returned above across profiles is 48,000 results. But there’s more: given the enormous number of results suddenly showing up, alongside the fact that a great many of them are overtly hostile to trans people, it’s worth asking how different this past year is from the years previous. While we can’t replicate the exact methodology with the RSS feeds and the alerts, what we can do is repeat the above experiment to get an approximation.
Allowing for variance , a query for “transgender” before:2017-01-01 returns around 300-400 results. The same query, restricted to after that time returns 51,200, 48,400 of which are after New Year’s day 2023. 44,800 of which appear to be tagged in the index as being from just this year?
Well, for one, not counting affiliates, Fox News ran well over 600 unique articles this past year on trans people.
Fox, however, is far from the most anti-trans of the bunch. Remember Church Militant, which I blogged about last year? They continue to appear in Google News unabated, including articles which flagrantly violate Google News’ stated policy. Their website is replete with references to “Sodom and Gomorrah” and regularly reminds visitors that “a war is coming.” Other notable entries include Patriot Post, which played a large role in laundering the election fraud claim which lead to the January 6th attack on Capitol Hill. Their content was served 65 times, accounting for 35 unique articles, all of which were viciously anti-trans.
While only a handful of LGBTQ-specific outlets appear in the index at all, 24 separate Catholic news sites were served, an enormous number of whom are ardently and often cruelly anti-trans. Catholic News Agency, was served 197 times, including this one, which claims that 11% of transgender genital surgeries were on minors. The Blaze was served 173 times, accounting for 91 unique articles including one titled “Biden’s transsexual assistant secretary of health suggests America will soon embrace genital mutilation of children”. While a slightly longer version now appears at the top of the article, the URL slug preserves the original title. Meanwhile, the New American, (41x, 25 unique) boasts “LGBT Organization Helps Children Skirt Laws Protecting Them from Transgender Butchery”.
Although Catholic Militant only appears a little more than a dozen times (which is a dozen times too many) in the dataset, The Lion, a propaganda arm of the Stanley Herzog Foundation appears 134 times, equating to roughly one anti-trans news story every other day for the entire year. The Lion frequently manipulates news stories to misrepresent the basic facts as I’ve shared previously here and here. To speak directly to Google’s claim that the fact an item is served to the alerts RSS feed doesn’t replicate the actual user experience, let me assure you that I’d never heard of The Lion until they were getting pushed to my top results in Google News.
The same is true for the American Family Association, which appears regularly in my organic (manually searched) Google News results. The AFA has a slight advantage, in that they offer their own advertising and banner ad network. There may be a clue here…
Search engines like google are highly susceptible to pay-to-play strategies where astroturfed groups invest large amounts of money in targeted advertising to drive up clicks. The advent of the aggregator website also means that these ads often get reproduced as organic links, inevitably elevating the ranking of those with money. The Lion is a regular customer of Google Ads, some of which can be seen here in Google’s Ad Transparency Center.
If we restrict results to those served from Matt Walsh’s Daily Wire— home of Michael “eradicate transgenderism” Knowles, DW’s main domain (in variable below as “pullURL”, which doesn’t account for the personal websites of figureheads like Walsh, Knowles or Shapiro) accounts for 159 unique news articles in the index across keywords. Compare this to Media Matters, which was only served to the RSS feeds a combined 8 times when accounting for unique articles. DW has spent well over a million dollars in the past year on Meta’s Instagram and Facebook platforms.
> length(unique(news$EntryURL[which(news$pullURL=="www.dailywire.com/")])) [1] 159
> length(unique(news$EntryURL[which(news$pullURL=="www.mediamatters.org/")])) [1] 8
So how else do we account for the astronomical rise in stories? To establish this with any certainty and the appropriate rigor that it deserves is beyond the capability of the software at this time, but I think there’s a clue when we look at the data. The precipitous rise in the US news media begins in earnest in April, and sustains throughout the data. There are several possible explanations for this, the first of which is that a series of inflammatory campaigns began in earnest at that time, including the onslaught of harassment and threats targeted at Dylan Mulvaney. As Media Matters points out, conservative outlets covered this one story more than nearly any other— including stories about conservative flagship topics such as abortion.
The search term “mulvaney” reveals 466 rows in the dataset, 310 of which are unique article URLs. A manual search of that word from New Years' Day 2023 onward shows a total of 904 results, again suggesting that the dataset represents a significant sample of the relevant news content.
This time also corresponded with a particular moment in the legislative session when bills started to reach their apex. As part of coordinated strategies, anti-LGBTQ+ groups have poured astonishing amounts of money into campaigns to ramrod discriminatory bills through. Local news stories are quick to assist, whether willingly or otherwise by running press releases as news.
That jump also happens to be two weeks after the release of ChatGPT-4,which occurred on March 14, 2023. ChatGPT is a type of AI known as a Large Language Model (LLM), which can mimic human speech by studying other writing as training data. Enormous concerns have arisen in the past year around the role of LLMs and other generative models in the theft of creative works like writing and visual art. Many of the most lucrative news sites aren’t actual news sites at all, but automated clearinghouses that use language models to scrape and mildly alter the original text just enough to evade search engine optimization penalties built into the most widely used search engines.