Quick Guide on SAS vs R vs SPSS
Data science community is always abuzz with discussions around the superiority and popularity of one statistical software over another. Statistical software like SAS, SPSS, R, Python, Julia, etc. keep battling and challenging each other as the best statistical programming language. However, for someone starting on a career as data analysts, it’s important that one has the right information to choose the right statistical programming language.
The following comparison is aimed at giving a basic comparison between statistical programming languages like SAS, R and SPSS to see how they stack up on different parameters that are important form a decision making perspective.
Key parameters on which we make the comparison are:
- Cost & Ease of learning
- Software capabilities
Cost & Ease of learning
Both SAS & SPSS are costly softwares and could be beyond reach for many individual professionals. However, if you are a working professional and your Organization has SAS or SPSS installed, then that takes care of your problems. For others access to SAS & SPSS poses a cost problem compared to R which are free and easily available.
Among the three, SAS is probably the easiest to learn. SAS has a good stable GUI interface and there is plethora of content available on the web for learning SAS. SPSS comes next in ease of learning with user friendly interface and easy to use drop down menus. R is the most difficult to learn among the three as it requires one to learn and understand coding.
All three languages are comparable on performance; have good data handling capabilities and options for parallel computations. Both SAS & SPSS has decent functional graphical capabilities. However, R has the most advanced graphical capabilities among the three.
When it comes to upgradation, because of their open nature R is faster off the block. SAS, & SPSS also keep coming up with regular updated versions. Because of its wide use in academics, R enjoys superior faster development of new techniques. One advantage wih SAS & SPSS releases are that they are issued in controlled environment and are thoroughly tested beforehand. R on the other hand, have open contribution and there are chances of errors.
Popularity & Job Prospects
As per the data available with indeed.com, SAS leads the pack of statistical software by a big margin.
R sits between SAS and SPSS. R has not only caught up with SPSS in nob opportunity, but surpassed it with around 50% more job postings. Figure 1c compares the number of analytics jobs available for R and SPSS across time. Analytics jobs for SPSS have not changed much over the years, while those for R have been steadily increasing.
Scholarly articles are a good indicator of emerging future trends. When it comes to finding mention in scholarly articles, SPSS is by far the most dominant package, perhaps due to its balance between power and ease-of-use. SAS has around half as many, followed by a tight grouping of R, Stata and MATLAB.
ALSO READ: How to Become a Data Scientist
On trend analysis, however, R seems to be showing growing use while SAS and SPSS show decline use. When it comes to books carrying the names of the software in its title, SAS has a huge lead with 576 titles, followed by SPSS with 339 and R with 240. However Blog statistics and LinkedIn and Quora data show R to be way ahead of others indicating lots of activity around R in the data science community. Similarly on Kaggle and Stack overflow, R dominates with more than 50% Kaggle winners using R and number of posts on ‘R’ is 7 times that of posts on SAS on Stack Overflow.
While all three are equally capable statistical software packages, R is on a strong upward trend and therefore enjoys an edge over the other two from a student’s perspective who is just starting his career.