Accessing the SQP 3.0 API

The sqpr package gives you access to the API of the Survey Quality Prediction website, a data base that contains over 40,000 predictions on the quality of questions.

Registration

This vignette will cover how to log in to the API and download data interactively. First things first, you need to register in the SQP 3.0 website and confirm your registration through your email.

Got that ready? Let’s move on.

Loging in

First, install and load the package in R with:

devtools::install_github("sociometricresearch/sqpr")
library(sqpr)

There’s currently three ways of loging in to the API. The first one by setting your credentials as enviroment variables. How do I do that?

Sys.setenv(SQP_USER = 'This is where your email or username goes')
Sys.setenv(SQP_PW = 'This is where your password goes')

once you executed the previous lines, run sqp_login() and if you don’t get an error, then you signed up just fine.

sqp_login()

That easy! The nice thing is that once you executed the previous lines you can erase them from your script and nobody will ever see your account credentials.

Very similarly, you can also set your credentials as options in your R session.

options(
  SQP_USER = 'This is where your email or username goes'),
  SQP_PW = 'This is where your password goes'
)
sqp_login()

This is very similar to the previous approach in that you can delete the options(...) statement and sqp_login() will always find it. Note that both approaches only work during your R session. If you’re interested in setting the SQP_USER and SQP_PW variables permanently in your system environment (that means not having to write them down everytime you open/close your session), then you should check out this chapter on environmental variables.

Finally, there is one last approach, which I don’t recommend. You can always sign up by placing your credentials inside sqp_login(), but that would require for your credentials to be visible to anyone you share your code to. That means any documents shared on Google Drive, Github, Bitbucket or just a simple share over gmail. In any case, here’s how you’d do it:

sqp_login(
  'This is where your email or username goes',
  'This is where your password goes'
)

Once you’ve ran sqp_login() once, you’re all set to work with the SQP 3.0 API! No need to run it again unless you close the R session.

Exploring the SQP 3.0 API

If you’re aware of the study/question/country/language that you’re searchin for, please have a look at get_sqp, a function that allows to make one single query to obtain SQP estimates. See for example here.

The SQP 3.0 API has questions nested into studies. Some of these studies/questions might be in many different languages because the SQP 3.0 community is very diverse. For that, we need to access the id’s of the study and question of interest. Let’s look for all the questions that have tv at the beginning of the word in the European Social Survey round 4. First, we need to figure out the id of that study. find_studies() accepts a string with your study of interest and it searches for similar names in the SQP 3.0 database.

find_studies("ess")
#> # A tibble: 26 x 2
#>       id name                                                                 
#>    <int> <chr>                                                                
#>  1  1206 ASSESSING THE IMPACT OF ORGANISATIONAL CULTURE ON SERVICE DELIVERY IN
#>  2  1441 Business performance                                                 
#>  3   934 ESS 2012                                                             
#>  4  1915 ESS Pilot R10                                                        
#>  5  1847 ESS R10 Pretesting                                                   
#>  6     1 ESS Round 1                                                          
#>  7  1943 ESS Round 10                                                         
#>  8     2 ESS Round 2                                                          
#>  9     3 ESS Round 3                                                          
#> 10     4 ESS Round 4                                                          
#> # … with 16 more rows

Aha! There it is. Then more specifically..

find_studies("ESS Round 4")
#> # A tibble: 1 x 2
#>      id name       
#>   <int> <chr>      
#> 1     4 ESS Round 4

find_studies is very helpful for finding patterns in the SQP 3.0 database. However, if you can’t find your study, you should use get_studies() which will return all the studies in the SQP 3.0 database and then perform your search manually.

Ok, so we know our study is there. Which questions are in that study? find_questions will do the work for you.

find_questions("ESS Round 4", "tv")
#> # A tibble: 129 x 5
#>       id study_id short_name country_iso language_iso
#>    <int>    <int> <chr>      <chr>       <chr>       
#>  1 66819        4 TvTot      AT          deu         
#>  2 66821        4 TvPol      AT          deu         
#>  3 66727        4 PrtVtxx    AT          deu         
#>  4  8183        4 TvTot      BE          fra         
#>  5 24064        4 TvPol      BE          fra         
#>  6 24005        4 PrtVtxx    BE          fra         
#>  7  8137        4 TvTot      BE          nld         
#>  8 27392        4 TvPol      BE          nld         
#>  9 27332        4 PrtVtxx    BE          nld         
#> 10  8275        4 TvTot      BG          bul         
#> # … with 119 more rows

That might take a while because it’s downloading all of the data to your computer.

Ok, that search was unsuccesfull, as we can see there are questions that don’t begin with tv. We can also use regular expressions to find more detailed patterns. For example, ^tv will find all questions that start (^) with tv.

tv_qs <- find_questions("ESS Round 4", "^tv")
tv_qs
#> # A tibble: 88 x 5
#>       id study_id short_name country_iso language_iso
#>    <int>    <int> <chr>      <chr>       <chr>       
#>  1 66819        4 TvTot      AT          deu         
#>  2 66821        4 TvPol      AT          deu         
#>  3  8183        4 TvTot      BE          fra         
#>  4 24064        4 TvPol      BE          fra         
#>  5  8137        4 TvTot      BE          nld         
#>  6 27392        4 TvPol      BE          nld         
#>  7  8275        4 TvTot      BG          bul         
#>  8 16155        4 TvPol      BG          bul         
#>  9 21064        4 TvTot      HR          hrv         
#> 10 21066        4 TvPol      HR          hrv         
#> # … with 78 more rows

There it is (you can also supply more than one variable name with a character vector like c("tvtot", "tvpol")). We get all the tv questions in different languages and for different countries. Let’s subset the questions for Spanish only.

sp_tv <- tv_qs[tv_qs$language_iso == "spa", ]
sp_tv
#> # A tibble: 2 x 5
#>      id study_id short_name country_iso language_iso
#>   <int>    <int> <chr>      <chr>       <chr>       
#> 1  7999        4 TvTot      ES          spa         
#> 2 27699        4 TvPol      ES          spa

The hard part is done now. Once we have the id of your questions of interest, we supply it to get_estimates and it will bring the quality predictions for those questions.

get_estimates(sp_tv$id)
#> # A tibble: 2 x 4
#>   question reliability validity quality
#>   <chr>          <dbl>    <dbl>   <dbl>
#> 1 tvtot          0.713    0.926    0.66
#> 2 tvpol         NA       NA       NA

get_estimates will return all question names as lower case for increasing the chances of compatibility with the name in the questionnaire of the study.

It will also return empty fields for the questions that don’t have any predictions, like tvpol in the example above. Moreover, get_estimates gives the option of grabbing more columns from the SQP 3.0 API that contain estimations such as standard errors and interquantile ranges of the predictions from above. See ?get_estimates for a list of all available columns. You can do that like this:

get_estimates(sp_tv$id, all_columns = TRUE)
#> # A tibble: 2 x 23
#>   question reliability validity quality question_id    id created routing_id
#>   <chr>          <dbl>    <dbl>   <dbl>       <int> <dbl> <chr>        <dbl>
#> 1 tvtot          0.713    0.926    0.66        7999 11345 2020-0…          1
#> 2 tvpol         NA       NA       NA          27699    NA <NA>            NA
#> # … with 15 more variables: authorized <dbl>, complete <dbl>, user_id <dbl>,
#> #   error <dbl>, errorMessage <dbl>, reliabilityCoefficient <dbl>,
#> #   validityCoefficient <dbl>, methodEffectCoefficient <dbl>,
#> #   qualityCoefficient <dbl>, reliabilityCoefficientInterquartileRange <list>,
#> #   validityCoefficientInterquartileRange <list>,
#> #   qualityCoefficientInterquartileRange <list>,
#> #   reliabilityCoefficientStdError <dbl>, validityCoefficientStdError <dbl>,
#> #   qualityCoefficientStdError <dbl>

The nice thing about get_estimates is that once you’ve explored and found your variables, if you save the id’s of the questions in your script, once your restart your R session you can avoid most of the steps above and get your estimates with only sqp_login(); get_estimates(question_ids), assuming you placed your login information as environmental variables (or options) and your question ids are in a vector named question_ids.

I hope that gives a general idea of how to explore the SQP 3.0 API interactively. For more examples on how this blends with the other sqpr related functions, check out the case-study vignette of the sqpr package.

Jorge Cimentada

2021-01-25

Registration

Loging in

Exploring the SQP 3.0 API