Dr. Drang put up a year in review post in which he uses some shell commands to look at his 2015 posts and word counts.

I thought it would be fun to do the same for Take no one’s word for it. Except that I wanted to look into the pre-2015-past and get the data into R so I can cook up some plots. You gotta have your plots.

Total words over the years

Number of posts

This is Dr. Drang’s one-line bash command to get post counts per month.

for d in `ls`; do echo -n $d; ls $d | wc -l; done

I wanted to get output that included the year and month of each word count, so I modified his one-liner to become:

#!/bin/bash
for ii in {2012..2015}; 
    do for jj in $(seq -f "%02g" 1 12);
        do echo -n $ii; 
        echo -ne '\t';
        echo -n $jj; 
        echo -ne '\t';
        ls ~/sherifsoliman/_posts/$ii-$jj* 2> /dev/null | wc -w;
    done
done;

Which gives me this output:

2012	01	0
2012	02	0
2012	03	0
2012	04	1
2012	05	2
2012	06	0
...

Visualized:

Number of posts by month and year

Number of posts by year

Word counts

Here’s Dr. Drang’s one-liner for calculating word counts:

for d in `ls`; do echo -n $d; cat $d/* | wc -w; done

And this is my modified version:

#!/bin/bash
for ii in {2012..2015}; 
    do for jj in $(seq -f "%02g" 1 12);
        do echo -n $ii; 
        echo -ne '\t';
        echo -n $jj; 
        echo -ne '\t';
        cat ~/sherifsoliman/_posts/$ii-$jj* 2> /dev/null | wc -w;
    done
done;

Which gives me this output1:

2012	01	0
2012	02	0
2012	03	0
2012	04 	652
2012	05	1809
2012	06	0
...

Words by month and year

Words by year

This is the R code that bakes data into plots:

# R script to read and plot
# Take no one's word for it word and post
# count data
# January 1, 2016

library(ggplot2)
library(scales)

# word count data
word <- read.table("~/Desktop/wordcount.txt", header = FALSE)
colnames(word) <- c("year", "month", "words")
word$year <- as.factor(word$year)

word$date <- paste("2012", word$month, "01", sep = "-")
word$date <- as.POSIXlt(word$date, origin = "1970-01-01")

# post number data
post <- read.table("~/Desktop/postcount.txt")
colnames(post) <- c("year", "month", "posts")
post$year <- as.factor(post$year)
post$date <- paste("2012", post$month, "01", sep = "-")
post$date <- as.POSIXlt(post$date, origin = "1970-01-01")

# define theme color for words and posts
wordStripColor <- "deepskyblue1"
postStripColor <- "darkorange"

## plots

# Words by month and year
svg("~/sherifsoliman/assets/imgs/20160101-tnowfi-words/wordsByMonthAndYear.svg", bg = "transparent")
wordsByMonthAndYear <- ggplot(word, aes(x = date, y = words)) +
  geom_line(group = 1) +
  geom_point() +
  theme_bw() +
  theme(strip.background = element_rect(fill = wordStripColor)) +
  scale_x_datetime(date_breaks = "1 month", labels = date_format("%b")) +
  facet_grid(year ~ .) +
  xlab("Months") +
  ylab("Words") +
  ggtitle("Words by month and year")
                   
wordsByMonthAndYear
dev.off()

# Posts by month and year
svg("~/sherifsoliman/assets/imgs/20160101-tnowfi-words/postsByMonthAndYear.svg", bg = "transparent")
postsByMonthAndYear <- ggplot(post, aes(x = date, y = posts)) +
  geom_line(group = 1) +
  geom_point() +
  theme_bw() +
  theme(strip.background = element_rect(fill = postStripColor)) +
  scale_x_datetime(date_breaks = "1 month", labels = date_format("%b")) +
  facet_grid(year ~ .) +
  xlab("Months") +
  ylab("Posts") +
  ggtitle("Posts by month by year")
postsByMonthAndYear
dev.off()

# totals
# Total words by year
wordtotals <- aggregate(words ~ year, data = word, sum)

svg("~/sherifsoliman/assets/imgs/20160101-tnowfi-words/totalWordsByYear.svg", bg = "transparent")
totalWordsByYear <- ggplot(wordtotals, aes(x = year, y = words)) +
  geom_bar(aes(fill = words), stat = "identity") +
  scale_fill_continuous(low = "snow2", high = wordStripColor, guide = FALSE) +
  scale_y_continuous(breaks = seq(0, max(wordtotals$words), by = 2000)) +
  theme_minimal() +
  xlab("Years") +
  ylab("Words") +
  ggtitle("Words per year")

totalWordsByYear
dev.off()

# Total posts by year
posttotals <- aggregate(posts ~ year, data = post, sum)

svg("~/sherifsoliman/assets/imgs/20160101-tnowfi-words/totalPostsByYear.svg", bg = "transparent")
totalPostsByYear <- ggplot(posttotals, aes(x = year, y = posts)) +
  geom_bar(aes(fill = posts), stat = "identity") +
  scale_fill_continuous(low = "snow2", high = postStripColor, guide = FALSE) +
  scale_y_continuous(breaks = seq(0:max(posttotals$posts))) +
  theme_minimal() +
  xlab("Years") +
  ylab("Posts") +
  ggtitle("Number of posts by year")

totalPostsByYear
dev.off()

Future

I wrote a lot more in 2015 than in previous years. The total number of posts and words is almost double or more than double those of the next best year, and there are fewer months with no posts at all.

I enjoy writing and I want to keep this trend going into 2016 and beyond. It takes effort and time to write each post, especially if code is involved, but I think I get better and faster with practice.

  1. Dr. Drang’s posts are “kept in a Dropbox directory called source, with subdirectories for each year and sub-subdirectories for each month.” The first shell command in his post, ls */* | wc -l and specifically the */* part handles that organization. I keep all my posts in one flat directory, and I wrote my scripts accordingly. If you want to use any of those code, you’ll naturally need to modify them to fit your file structure. ↩︎