The Inner Workings of Plagiarism Detection Technology

There are a number of ways that technology can be used to identify potentially plagiarized content. This post examines the different ways, and how Turnitin uses search technology and content comparison algorithms to help educators help students learn how to use source attribution appropriately.

Plagiarism has always existed as a problem - the origins of the word date back to the 1st century. It's only of late, however, that plagiarism has become a significant concern not just for educators and researchers, but also in the public sphere. New instances of plagiarism seem to hit the news on a daily basis. Whether it's song lyrics, plagiarism by school officials, government ministers, speeches by political figures, or the plagiarism that happens in the classroom, incidents of plagiarism appear to be on the rise everywhere.

We have the internet to thank for that. With the rise of the internet, we've seen exponential growth of content created and made readily available, almost everywhere. The growth is happening on such a large scale that we don't even have a way to grasp how huge of a change in content creation we're witnessing. In 2013, pegged the amount of total internet content at 14.3 trillion pages (article). The growth is happening so fast, that we don't have a way to accurately determine the number of new pages created each day or the total amount of content that currently exists online. The best estimates suggest there are 47 billion indexed and searchable web pages.(article) To put this number into perspective, it would take approximately 300 trillion sheets of paper to print out the entire internet, today.