This post is Part 1 of a 2-part series about using R to download and examine bookmarks using the httr package and the Pinboard API.

Part 2: Checking links and link responses with httr and R

I’ve been using Pinboard to archive links since 2011. Pinboard is great: it describes itself as “a bookmarking website for introverted people in a hurry.”

The site is simple, fast, and lightweight. It’s designed to have no cruft and let you get on with your business.

Pinboard has an API that many mobile and desktop apps use, and that I wanted to use with R. You can read the API documentation here.

Authentication

There are two ways to authenticate with the Pinboard API:

  • Regular HTTP Auth using your Pinboard username and password:

    curl https://user:password@api.pinboard.in/v1/method

    or

  • API authentication tokens (you can find your API token in settings/password):

    curl https://api.pinboard.in/v1/method?auth_token=user:NNNNNN

The API documentation doesn’t allow third-party apps to use the first method and indicates that third-party apps not using API tokens will be blocked. As a single user you could use either one, but I’ll use the API token here.

A note about rate limits

API requests are limited to one call per user every three seconds, except for the following:

posts/all - once every five minutes
posts/recent - once every minute

Keep that in mind as you experiment with the code.

R code

The R code to get all your bookmarks is so short and simple it may almost seem like a letdown:

require(httr)

# My API token is saved in an environment file
pinsecret <- Sys.getenv('pin_token')

# GET all my links in JSON
pins_all <- GET('https://api.pinboard.in/v1/posts/all',
                query = list(auth_token = pinsecret,
                             format = 'json'))

# Save the response to file for the analysis script to use
save(pins_all, file = 'pins.RData')

Notes:

  • I’m using the httr package to call the API.

  • To avoid accidentally committing/pushing/publishing my API token, I saved the token as an environment variable in ~/.Renviron (add one line that reads pin_token=user:ABCD01234XYZ567890QE), which lets me look it up in R using Sys.getenv('pin_token').

  • The GET() function calls the url for getting all links, and adds the auth_token as a query. It also asks the API to return the links in json format (the default is xml).

  • Finally, I save the response object I get back to disk for the analysis script to use later.

As you can see, the explanation of the code is longer than the code itself.

You can see the json content in the response object using content() and the fromJSON() function from the jsonlite package:

library(magrittr) # so we can use pipe: %>%
library(jsonlite)

# warning: this will dump a lot of output to the console
content(pins_all) %>% fromJSON() 

In the next post, I’ll talk about cleaning up the data, and doing the analysis that was the intention of the API call: exploring the number of dead links as a function of time and hostname.