accesing-sqp-api.Rmd
The sqpr
package gives you access to the API of the Survey Quality Prediction website, a data base that contains over 40,000 predictions on the quality of questions.
This vignette will cover how to log in to the API and download data interactively. First things first, you need to register in the SQP 3.0 website and confirm your registration through your email.
Got that ready? Let’s move on.
First, install and load the package in R with:
devtools::install_github("sociometricresearch/sqpr") library(sqpr)
There’s currently three ways of loging in to the API. The first one by setting your credentials as enviroment variables. How do I do that?
Sys.setenv(SQP_USER = 'This is where your email or username goes') Sys.setenv(SQP_PW = 'This is where your password goes')
once you executed the previous lines, run sqp_login()
and if you don’t get an error, then you signed up just fine.
That easy! The nice thing is that once you executed the previous lines you can erase them from your script and nobody will ever see your account credentials.
Very similarly, you can also set your credentials as options in your R session.
options(
SQP_USER = 'This is where your email or username goes'),
SQP_PW = 'This is where your password goes'
)
sqp_login()
This is very similar to the previous approach in that you can delete the options(...)
statement and sqp_login()
will always find it. Note that both approaches only work during your R session. If you’re interested in setting the SQP_USER
and SQP_PW
variables permanently in your system environment (that means not having to write them down everytime you open/close your session), then you should check out this chapter on environmental variables.
Finally, there is one last approach, which I don’t recommend. You can always sign up by placing your credentials inside sqp_login()
, but that would require for your credentials to be visible to anyone you share your code to. That means any documents shared on Google Drive, Github, Bitbucket or just a simple share over gmail. In any case, here’s how you’d do it:
sqp_login( 'This is where your email or username goes', 'This is where your password goes' )
Once you’ve ran sqp_login()
once, you’re all set to work with the SQP 3.0 API! No need to run it again unless you close the R session.
If you’re aware of the study/question/country/language that you’re searchin for, please have a look at get_sqp
, a function that allows to make one single query to obtain SQP estimates. See for example here.
The SQP 3.0 API has questions nested into studies. Some of these studies/questions might be in many different languages because the SQP 3.0 community is very diverse. For that, we need to access the id
’s of the study and question of interest. Let’s look for all the questions that have tv
at the beginning of the word in the European Social Survey round 4. First, we need to figure out the id of that study. find_studies()
accepts a string with your study of interest and it searches for similar names in the SQP 3.0 database.
find_studies("ess") #> # A tibble: 26 x 2 #> id name #> <int> <chr> #> 1 1206 ASSESSING THE IMPACT OF ORGANISATIONAL CULTURE ON SERVICE DELIVERY IN #> 2 1441 Business performance #> 3 934 ESS 2012 #> 4 1915 ESS Pilot R10 #> 5 1847 ESS R10 Pretesting #> 6 1 ESS Round 1 #> 7 1943 ESS Round 10 #> 8 2 ESS Round 2 #> 9 3 ESS Round 3 #> 10 4 ESS Round 4 #> # … with 16 more rows
Aha! There it is. Then more specifically..
find_studies("ESS Round 4") #> # A tibble: 1 x 2 #> id name #> <int> <chr> #> 1 4 ESS Round 4
find_studies
is very helpful for finding patterns in the SQP 3.0 database. However, if you can’t find your study, you should use get_studies()
which will return all the studies in the SQP 3.0 database and then perform your search manually.
Ok, so we know our study is there. Which questions are in that study? find_questions
will do the work for you.
find_questions("ESS Round 4", "tv") #> # A tibble: 129 x 5 #> id study_id short_name country_iso language_iso #> <int> <int> <chr> <chr> <chr> #> 1 66819 4 TvTot AT deu #> 2 66821 4 TvPol AT deu #> 3 66727 4 PrtVtxx AT deu #> 4 8183 4 TvTot BE fra #> 5 24064 4 TvPol BE fra #> 6 24005 4 PrtVtxx BE fra #> 7 8137 4 TvTot BE nld #> 8 27392 4 TvPol BE nld #> 9 27332 4 PrtVtxx BE nld #> 10 8275 4 TvTot BG bul #> # … with 119 more rows
That might take a while because it’s downloading all of the data to your computer.
Ok, that search was unsuccesfull, as we can see there are questions that don’t begin with tv
. We can also use regular expressions to find more detailed patterns. For example, ^tv
will find all questions that start (^
) with tv
.
tv_qs <- find_questions("ESS Round 4", "^tv") tv_qs #> # A tibble: 88 x 5 #> id study_id short_name country_iso language_iso #> <int> <int> <chr> <chr> <chr> #> 1 66819 4 TvTot AT deu #> 2 66821 4 TvPol AT deu #> 3 8183 4 TvTot BE fra #> 4 24064 4 TvPol BE fra #> 5 8137 4 TvTot BE nld #> 6 27392 4 TvPol BE nld #> 7 8275 4 TvTot BG bul #> 8 16155 4 TvPol BG bul #> 9 21064 4 TvTot HR hrv #> 10 21066 4 TvPol HR hrv #> # … with 78 more rows
There it is (you can also supply more than one variable name with a character vector like c("tvtot", "tvpol")
). We get all the tv
questions in different languages and for different countries. Let’s subset the questions for Spanish only.
sp_tv <- tv_qs[tv_qs$language_iso == "spa", ] sp_tv #> # A tibble: 2 x 5 #> id study_id short_name country_iso language_iso #> <int> <int> <chr> <chr> <chr> #> 1 7999 4 TvTot ES spa #> 2 27699 4 TvPol ES spa
The hard part is done now. Once we have the id
of your questions of interest, we supply it to get_estimates
and it will bring the quality predictions for those questions.
get_estimates(sp_tv$id) #> # A tibble: 2 x 4 #> question reliability validity quality #> <chr> <dbl> <dbl> <dbl> #> 1 tvtot 0.713 0.926 0.66 #> 2 tvpol NA NA NA
get_estimates
will return all question names as lower case for increasing the chances of compatibility with the name in the questionnaire of the study.
It will also return empty fields for the questions that don’t have any predictions, like tvpol
in the example above. Moreover, get_estimates
gives the option of grabbing more columns from the SQP 3.0 API that contain estimations such as standard errors and interquantile ranges of the predictions from above. See ?get_estimates
for a list of all available columns. You can do that like this:
get_estimates(sp_tv$id, all_columns = TRUE) #> # A tibble: 2 x 23 #> question reliability validity quality question_id id created routing_id #> <chr> <dbl> <dbl> <dbl> <int> <dbl> <chr> <dbl> #> 1 tvtot 0.713 0.926 0.66 7999 11345 2020-0… 1 #> 2 tvpol NA NA NA 27699 NA <NA> NA #> # … with 15 more variables: authorized <dbl>, complete <dbl>, user_id <dbl>, #> # error <dbl>, errorMessage <dbl>, reliabilityCoefficient <dbl>, #> # validityCoefficient <dbl>, methodEffectCoefficient <dbl>, #> # qualityCoefficient <dbl>, reliabilityCoefficientInterquartileRange <list>, #> # validityCoefficientInterquartileRange <list>, #> # qualityCoefficientInterquartileRange <list>, #> # reliabilityCoefficientStdError <dbl>, validityCoefficientStdError <dbl>, #> # qualityCoefficientStdError <dbl>
The nice thing about get_estimates
is that once you’ve explored and found your variables, if you save the id’s of the questions in your script, once your restart your R session you can avoid most of the steps above and get your estimates with only sqp_login(); get_estimates(question_ids)
, assuming you placed your login information as environmental variables (or options) and your question ids are in a vector named question_ids
.
I hope that gives a general idea of how to explore the SQP 3.0 API interactively. For more examples on how this blends with the other sqpr
related functions, check out the case-study vignette of the sqpr
package.