Readability of social media privacy policies

How difficult is it to read the terms of services?

Grades 5 - 9

Ever find yourself swiftly clicking “I understand and agree?” to lengthy terms and conditions on a new service you've just signed up for? Many people do, given the often intricate language involved.

Just how tough is it to read these terms? Our data visualization explores the readability of privacy policies on popular social media sites.

To answer our question:

  • We used the Princeton-Leuven Longitudinal Privacy Policy Dataset, an archive of website policies that spans 20+ years and hosts over 1 million policies. We picked the policies from popular social media sites: TikTok, Twitter, Facebook, Instagram, YouTube, and Pinterest.
  • We analyzed the readability of the privacy policies by looking at their grade level score, reading score, and reading time.

Visualizing the data

The visualization below shows different measures of readability of privacy policies for TikTok, Twitter, Facebook, Instagram, YouTube, and Pinterest. You can toggle between the reading time, readability score, and grade level to compare how each of these companies perform using these different measures to test readability difficulty level.

Readability score: Measures how easy it is to read a document. We applied the Dale-Chall readability formula, which uses a list of 3,000 words that are easily understood by an average 4th-grade student in America. Any word outside of that list is considered difficult to comprehend. A score of 9.0-9.9 indicates you have to be a college student to understand the text. Text with a score of 4.9 or lower means it is understandable by an average 4th-grade student, or younger.

Grade score: Indicates the number of years of education required to understand the text. We applied the Flesch-Kincaid grade level formula, where a score of 5 means that a fifth grader will generally understand the text.

Reading time: Measures the time it takes an average person to read the text. It assumes a reading pace of 14.69 milliseconds (one-thousandth of a second) per character.

We used the Textstat Python Library to calculate all scores. You can find more information about the library here.

The bar graphs show the privacy policies of social media platforms are overall difficult to read. The grade level score for TikTok, Twitter, Facebook, Instagram, and YouTube are all above 18, which means their privacy policies are as difficult to read as an academic paper. Pinterest’s readability score is 7.16 (understood by an average 9th or 10th-grade student) and TikTok’s is 8.64 (understood by an average 11th or 12th-grade student).

TikTok's policy is the quickest to read. With the lengthiest reading time, Twitter matches TikTok in reading score and grade level.

Reflect on what you see

Look and interact with the data visualization above. When you hover the mouse over the bar graph, you’ll notice more information appears. You can toggle between grade score, reading score, and reading time to compare the three measures.

Think about the following questions.

  • What is the reading score of the different privacy policies?
  • What grade level is needed to understand the privacy policies?
  • Is there a privacy policy that is easier to read than the others, or are they all similar in difficulty?
  • What do you wonder about the data?

Use the fill-in-the-blank prompts to summarize your thoughts.

  • “I used to think _______”
  • “Now I think _______”
  • “I wish I knew more about _______”
  • “These data visualizations remind me of _______”
  • "I really like _______”

Learn how we visualized the data

Go to our walk-through (in Jupyter notebook format) to see how the data science process was applied to create these graphs, from formulating a question to gathering the data and analyzing the data with code.