The goal of Randomuseragent is to have a easy access to different user-agent strings by randomly sampling from a pool of real strings.
You can install the released version of Randomuseragent from CRAN with:
The development version can be installed from GitHub with:
# install.packages("devtools") devtools::install_github("fangzhou-xie/Randomuseragent")
This is a basic example to get random user-agent strings:
library(Randomuseragent) random_useragent() >  "Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0" filter_useragent(min_obs = 50000, software_name = "Safari", operating_system_name = "Mac OS X") >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9" >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.59.10 (KHTML, like Gecko) Version/5.1.9 Safari/534.59.10" >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.78.2 (KHTML, like Gecko) Version/6.1.6 Safari/537.78.2" >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/E7FBAF" >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.5.17 (KHTML, like Gecko) Version/8.0.5 Safari/600.5.17" >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/6.2.8 Safari/537.85.17" >  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8) AppleWebKit/534.50.2 (KHTML, like Gecko) Version/5.0.6 Safari/533.22.3"
Both function will accept the same set of arguments for filtering user-agent strings. Please refer to documentation of either function for details.
random_useragent() is very convenient, but this may not be the best way if you care about performance.
random_useragent() essentially wraps up the
filter_useragent() function and return a random one from the pool.
However, if you need to generate LOTS OF them, i.e. calling
random_useragent() repeatedly, each time you call
random_useragent() you need first to filter from all the strings that this package provides, and then randomly draw one from the pool. Hence, you are doing the subsetting each time you call the function. This is very inefficient.
A better way would be to get the string pool directly from
filter_useragent() and then sampling yourself.
To note this difference, we need to time the following code chunks.
We run each method 5000 times to make a fair comparison between methods. You should immediately see that the second method is more than 50 times faster than the first one! That said, the first method only spends 0.2452 ms per call, on average, which is pretty fast already. The second method needs 4.4 ns per call. This is certainly faster, but for most use cases, I don’t think it worth going this far.
You can type
?random_useragent to see the documentation for the parameters.
min_obs: integer, threshold to filter number of times observed in the dataset. This is to keep the most frequently used UAs while removing the less frequently used ones. Larger number of this argument will result in less returned strings. Hence smaller set to be sampled from.
software_name: character vector, name of the software. For example, you can choose to only use
software_name="Chrome"or several platforms together
software_name = c("Safari", "Edge").
software_type: character vector, one or more of
"browser", "bot", "application". For webscraping applications, you would most likely choose
software_type="browser"to mimic real browser behavior.
operating_system_name: character vector, system being operated. For example, use one or more of
"Windows", "Linux", "Mac OS X", "macOS", etc.
layout_engine_name: character vector, e.g.
"Gecko", "Blink", etc.