Tools of the Trade: Researchers use machine learning to understand historic Canadian economic data

""

A U of T Mississauga economist is using artificial intelligence to gain new understanding of Canada’s economic history. 

A web-based machine learning tool, developed by an online retail giant for use by business clients, has been retooled to capture historic trade data, helping economic historian Nicholas Zammit bring new information to light while reducing the time and costs associated with his research project.

Zammit studies the long-term economic development and growth of settler economies like Canada and Australia. To do this, he analyzes granular information about imports and exports, tariffs and taxes, and how these relate to government policy of the time.

Zammit’s current research, which focuses on trade diversion and loss in the British dominions during the First World War, draws on primary sources like the Canada Trade Volumes. The digitized federal government documents span nearly a hundred years, from 1870 onwards, and include tables of lists itemizing everything from labour and steel to the number of beer kegs imported from abroad. 

The economic historian hopes to shed new light on how costly trade diversions or sanctions can be for countries engaged in war.

Despite the wealth of available information, Zammit faced a challenge shared by many researchers – accessing historic data in a searchable format. While the trade volumes are newly digitized, they are in PDF format which doesn’t convert easily into the spreadsheets Zammit uses in his economic analysis. 

Zammit had been manually entering data into spreadsheets since 2015, but was frustrated by the quantity of data and the cumbersome process. He estimates it would take one researcher more than 50 years to manually process a small segment of information that he was working with.

“We’ve got the price and quantity of every good traded between Canada and every other country,” says Zammit. “But it’s a very big data set with a lot of data points.”

Automating data collection

The project hit a major breakthrough when Zammit connected with Dev’Roux Maharaj, an undergraduate student studying economics and political science and working part-time with Amazon’s Mississauga operations.

Now in his fifth year with the program and a research assistant on Zammit’s project, Maharaj had worked with Amazon’s customer service team before moving over to Amazon Web Services (AWS), the cloud computing arm of the company. It was there he saw an opportunity to apply AWS technology to Zammit’s data conundrum. 

The solution was Textract, an AWS tool used by organizations, such as insurance companies, to automate and standardize collection of data from forms and other documents.

Maharaj looked to apply Textract’s machine learning abilities to the information contained in the trade volume tables. He connected with the University of British Columbia, which has a leading public-private collaboration with Amazon Web Services through UBC’s Cloud Innovation Centre. The centre offers work-integration opportunities for UBC students who worked with Textract code to refine the platform’s capabilities for automated high-volume data collection. 

Incorporating artificial intelligence into Zammit’s research has been nothing short of transformational.

What took the researcher three years of tedious data entry can now be accomplished in four months.

With just two clicks, the research team can now quickly and easily upload the trade volume PDFs and convert the information for use in an Excel workbook.

“We can scan 500 documents in less than 45 minutes,” Maharaj says. 

The process also gives the researchers the ability to easily filter the results. 

“Now the data is organized in the exact format that we need it to be,” Maharaj continues. “The cloud has enabled us to put this project on steroids.”

Expanding the research

Zammit notes the project has also created research opportunities for students to participate and gain valuable experience working with economic data.

growing team of 175 volunteer undergraduate student research assistants manage quality assurance by conducting comparative spot checks. Maharaj estimates Textract’s accuracy rate to be between 93 and 95 per cent.

For his part, Maharaj has been able to parlay skills learned on the project into an internship position with RBC Banking. 

“Making training plans, looking at Excel data and macros—it’s the same thing—process automation,” he says. “These skills are very applicable to the workforce, and we’re giving students those skills as well.” 

The digital tool allows Zammit and Maharaj to collect and analyze even more data and expand the scope of the research. The researchers hope to release preliminary results from their analysis in early 2022.

“We were going to focus on the war period, but given how successful the software is for us, we might go back to 1870,” Zammit says. “It’s blossoming, hopefully, into multiple papers.”

Zammit says the tool has reduced the cost of collecting information and increased the volume of available data, offering the researchers new opportunities to compare with modern economic phenomenon.

“This allows me to work with high quality data that will improve the quality of our research,” says Zammit. “The possibilities are limitless.”