This data comes form a group of Twitter searches conducted at several times during the calendar year of 2017. The data are commonly observed words associated with 10 different languages, including c("ar", "en", "es", "fr", "in", "ja", "pt", "ru", "tr", "und"). Variables include "word" (potential stop words), "lang" (two or three word code), and "p" (probability value associated with frequency position along a normal distribution with higher values meaning the word occurs more frequently and lower values meaning the words occur less frequently).

stopwordslangs

Format

A tibble with three variables and 24,000 observations

Examples

stopwordslangs
#> # A tibble: 255,327 x 3 #> lang word p #> <chr> <chr> <dbl> #> 1 und a 1 #> 2 und de 1 #> 3 und i 1 #> 4 und no 1 #> 5 und 3 1 #> 6 und lol 1 #> 7 und la 1 #> 8 und 1 1 #> 9 und yes 1 #> 10 und lt 1 #> # ... with 255,317 more rows