Times Insider explains who we are and what we do, and provides behind-the-scenes insights into how our journalism comes together.
As of this morning, programs written by New York Times developers have made more than 10 million requests for Covid-19 data from websites around the world. The data we’re collecting is a daily snapshot of virus trends and flows, covering every US state and thousands of US counties, cities, and ZIP codes.
You may have seen portions of this data in the daily maps and graphics we publish at The Times. Together, these pages, featuring more than 100 journalists and engineers from across the organization, are the most viewed collection in the history of nytimes.com and a key component of the reporting package. Covid has won The Times’ 2021 Pulitzer Prize. for public service.
The Times’ coronavirus tracking project is one of several efforts to help fill a gap in public understanding of the pandemic caused by a lack of coordinated government response. The Johns Hopkins University Coronavirus Resource Center has collected both national and international case data. And the Atlantic Covid Tracking Project has deployed an army of volunteers to collect US state data, in addition to testing, demographics and healthcare facility data.
At The Times, our work begins with a single spreadsheet.
In late January 2020, Monica Davey, an editor for the National Desk, asked Mitch Smith, a reporter based in Chicago, to begin gathering information on each Covid-19 case in the United States. One row per case, meticulously reported based on public announcements and manually entered, with details such as age, location, gender and status.
In mid-March, the explosive growth of the virus proved too much for our workflow. The spreadsheet was so large it became unresponsive and reporters didn’t have enough time to report and manually enter data from the growing list of US states and counties we needed to follow. track.
At this point, many health departments in the country began rolling out Covid-19 reporting efforts and websites to inform their constituents of the local spread. The federal government faced initial challenges in providing a single, reliable set of federal data.
Local data is available on the map, literally and figuratively. Formats and methodologies vary widely from place to place.
In The Times, a team of newsroom-based software developers was quickly tasked with building tools to augment as much data collection work as possible. The two of us – Tiff a newsroom developer, and Josh a graphics editor – will ultimately shape that growing team.
As of March 16, the core app was mostly up and running, but we needed help finding more sources. To tackle this massive project, we recruited developers from across the company, many of whom had no newsroom experience, to temporarily join the press.
At the end of April, we programmed to collect data from all 50 states and nearly 200 counties. But the pandemic and our database seem to be expanding exponentially.
Also, some notable websites have changed multiple times in just a few weeks, which means we have to rewrite our code over and over again. Our editorial engineers have tuned by streamlining our custom tools – while they’re in everyday use.
Up to 50 people outside of the collection team are actively involved in the management and verification of the data we collect on a daily basis. Some of the data is still entered by hand, and it’s all manually verified by reporters and researchers, seven days a week. Thorough reporting and subjectivity are essential parts of all our roles, from reporter to data reviewer to engineer.
In addition to publishing the data to The Times website, we made our dataset publicly available on GitHub at the end of March 2020 for all to use.
As vaccination limits the number of viruses nationwide – 33.5 million cases have been reported overall – some health departments and other sources are updating their data less frequently. than. In contrast, the federal Centers for Disease Control and Prevention expanded its report to include only partially comprehensive figures for 2020.
All that means is that some of our own custom data collections can shutdown. Since April 2021, our number of programmatic sources has decreased by almost 44%.
Our goal is to reduce that to about 100 active shavers by late summer or early fall, primarily to monitor potential hot spots.
The dream, of course, is to end our efforts when the threat of the virus is significantly reduced.
A version of this article was originally published on NYT open, The New York Times Blog about designing and building news products.