Hello. I'm Mark Jordan, your
instructor for running SAS 9 Programs in SAS Viya. As a SAS programmer,
you may have heard about the massively
parallel processing capabilities
available in SAS Viya. Perhaps, you've gotten
access to SAS Viya at work. Maybe you're taking
a university class using SAS Viya for Learners. Or it could be
you're just wondering what SAS Viya is all about. In any case, I'm really looking
forward to spending a few hours showing you how easy
it is to get your SAS 9 code running in our
high-performance SAS Viya environment. In lesson one, we'll
learn a little bit about what SAS Viya actually
is and how it's architected. And then we'll talk a
little bit about how to work with data in CAS libraries. So let's dive right in. SAS Viya is a great addition
to the SAS platform. The part of SAS Viya
that excites me most is the Cloud Analytics
Services, or CAS, provides us an in-memory
processing engine. That means that you can
lift data into memory and leave it there
while you conduct several subsequent
processes on it without having to write it
back to disk in between. And you can do this
safely because of the way the system redundancy is setup. This allows for
extremely fast processing when you're doing things like
manipulating data, and then doing some analysis
on the results. In addition, the way CAS
loads data into memory allows it to handle
data of any size, even if the data is larger
than the physical memory available to you. And it handles it all
with lightning speed. As we mentioned before,
SAS Viya is an addition to the SAS platform. And the SAS 9 engines still
exist within this platform. This course focuses
on those of us who have been programming
and SAS 9 for years and are wondering how we can
best leverage the new additions to the SAS platform. SAS Viya's an open platform. You can run SAS Viya in
a variety of environments in a way that makes it very
flexible to meet your needs. The main question
on our minds is, in this new massively
parallel processing, or MPP, environment, can we still use
the traditional SAS coding methods that we've
become accustomed to? And the answer is,
yes, as long as you are aware of how SAS Viya works. And some of the differences that
make in the way things process, you can certainly use
SAS code that you're familiar with in the
Viya environment. So in this course,
we're going to learn how to connect the SAS Viya
from SAS Compute Server session that would be in any
session like SAS 9 or the Compute Server associated
with a Viya platform itself. And then we're going
to write code that will allow us to access data in CAS. And we'll run SAS
9 programs and see how we might modify them
to better adapt them to the SAS Viya environment. SAS Viya is exciting because
of the way it accesses data. You can access data
from any source. You can access data that lives
on the on-premises hardware or data that you read
in from the cloud. You can process relational data. You can reprocess
unstructured data. And your data can be saved to
a variety of formats with ease. There's a new format for saving
CAS tables called SASHDAT. This is an exceptionally cool
format for this environment. Basically, it's memory mapped
so that it looks on disk pretty much like it looks in memory. And it is stored
distributed across the nodes in your environment. This data can be
loaded in parallel of the ram at lightning speed,
and is the preferred mode for storing tables from cars
when you write them to disk. In this class, we're going to
use SAS Studio on a pure Viya installation. You're going to see that
the code runs there exactly like it does in a traditional
SAS 9 installation. With a modern
version of SAS 9, you can also use SAS Studio or
Enterprise Guide or the SAS windowing environment even to
connect the SAS Viya in CAS and execute code up there. But it's more open than that. You don't even need a SAS
client really to run code Viya. You could do it from
Jupyter Notebook. You could do it from R Studio. You can do it from a variety
of different client interfaces depending on what you need
to build your application. In this course, we're focused
on running in CAS from SAS. The architecture for
SAS Viya is different than our traditional SAS
Compute Server installation. We're going to focus
pretty much on this area here, Cloud Analytics Services. Cloud Analytics
Services is designed to provide a
distributed processing model in which
pieces of the data are loaded across
several machines and are processed in parallel. The data is lifted into
memory, processed once, processed again. Subsequent analysis
can happen all without writing the
data back out the disk. And only when you wish
to save out the data and not work with anymore, you
explicitly save the data back to the offline storage. Working with CAS is made
significantly easier in SAS Studio by the
presence of snippets. Snippets in the SAS
Studio interface are accessed by this
icon you see over here. And their broken up into
folders for snippets designed for different purposes. In the SAS Viya Cloud
Analytics Services there are snippets
like templates. They allow you to
connect to CAS, to load data into
the CAS memory, to save data back out
at the offline storage. Or just to drop it out
of memory, if you like, and to disconnect
from your CAS session. So before we can do
any processing in CAS from our SAS 9
environment, we first have to establish a
connection to a CAS session. So let's start
working with SAS Viya. The first thing we
need to do, of course, is get into SAS Studio. And you can access SAS Studio
by starting the Chrome browser down here on the Task Bar. When the chrome
opens up, there's a folder appear with bookmarks. We'll just click on SAS Studio. User ID and password are student
and Metadata0, capital M, e-t-a-d-a-t-a, with a
numeric zero on the end. We won't need administrator
authority for anything we're doing in this class. So when we log in we're
going to choose the default, no about assuming
SAS admin authority. Once SAS Studio is open, you
can see that it's very light. I am an old submarine
sailor and I am very fond of the
new dark themes that are coming so much in vogue. SAS Studio has a dark
theme that I can set. So I'm going to go up to my
Profile here click on Settings. And here, we have a
chance to choose a theme. You'll notice that this light
theme is called illuminate. If you ever want to get back
to it, that's what you use. The dark theme is called ignite. And I'm going to
go ahead and choose that to make things a
little easier on my eyes. Now remember, you're in a
you're in a browser, right? So if you hold down the Control
button and scroll your mouse, you can make things bigger. And then if you
scroll the other way, you can make things smaller. So it's quite easy to
adjust this environment to work as you would like
it to work with your eyes. Now let's work with snippets
to get access to the CAS. Down here on the
Snippets bar you'll see that there's a whole section
on SAS Viya Cloud Analytic services. And a New CAS Session is the
one that we'll be needing. Now, you can Right-Click
and say Open As A Program, or you can just Double-Click
this and have it Open In A New Tab. This is a template for
starting a session. If we leave it
alone, our session will be called My Session. But you can name
your session anything you'd like as long as
it's a valid SAS name. And then in the
Session Options, you can set some differences from
the default, if you want. This determines how national
language system formats and things are interpreted. And this is the
active CASLIB that we want to use when we
start up our CAS session. Now we could change any of these
things, but we don't need to. We can just run this code. And in the log, you'll
see that my session has been connected to the
CAS and that the CAS user is the active CASLIB. Now that we have a cast
session established, eventually, we're going
to need to turn it off. So to terminate our CAS session,
we have another snippet. And I'll just Double-Click
Terminate The CAS Session. And you'll notice that because
we accepted the default session name, this is all set already
to terminate my CAS session. So I'm just going to go
ahead and run this code. And we can see that My Session
was successfully terminated. Now, to access files
during the course, you'll need to be able
to find those files and determine the
path to get to them. Here we're going to go
into the Server, Home, and in the Workshop
folder, and sgf20s9v. And you can see that
these are the folders that have our data in them. For example, if we wanted
to write a SAS program that would access this Excel
file, we could Right-Click on that, choose Properties,
and there's the name of the file and the location. Now you notice, we had
to click an awful lot to get down in here. We're going to be using
these files a lot, so it'd be really helpful to
have a shortcut to these files and make it quicker and easier
to get to our course data. We can do this by
taking the file that we want to make a
shortcut to, Right-Click on it and choose Add A Shortcut. Give it a reasonable name. Now, I'm going to click OK. And when I do that, up here
in the folder shortcuts, you'll now see a folder
shortcut for sgf20s9v. And you'll see that it's
a quick way of accessing those same folder and files
that we saw in the workshop. So from now on, I'm
going to use the shortcut to get at the files for
this particular class. Now that we've connected
to a CAS session, the next step would be
to access some data up in the CAS library. In a traditional
SAS environment, we use a LIBNAME to point to
collections of SAS data files that are stored somewhere in a
network location we have access to from our SAS Compute Server. So for example,
in Linux your path might look like this, to a
folder named MySAS, which contains several SAS data sets. We'd like to refer to
this aggregate location for SAS data sets as MySAS. And so we use a
LIBNAME statement to access that folder
location using the Base SAS engine to interpret the data
sets that are stored in there. Our CASLIBs work a
little differently. A CASLIB consists of
both an in-memory space and the physical data
storage space that's offline. When you're working
with CAS data, only the tables that
are loaded in memory are available for processing. The data source can
store many different type of data that can be loaded
into memory quickly and easily in the CASLIB. Those sources are not yet
available for processing and cast until they're
loaded on the membrane space. The data source can
store physical files, like Excel files and SAS
data sets or SASHDAT files. But it can also store
connection information that allows you to make
high-speed connections to relational databases
and other data sources. Now, CASLIBs have
two different scopes. A Session Scope means that
when you assign that CASLIB, it will be active only for as
long as the session remains active. And as soon as you
terminate this CAS session, the session-scoped CAS
library is terminated also, and its tables
disappear from memory. This works much like a work
library word, for instance, in traditional base SAS. Conversely, a library
can be assigned in CAS with Global Scope. Now you have to have
permission to create a globally-scoped library. And you may either
belong to a group that has been granted such permission
by the SAS administrator, or perhaps, you're the SAS
administrator yourself. CAS libraries created
with Global Scope are visible to all cast
sessions that have permission. You can set
permissions on these. They would be visible
to all cast sessions on that particular CAS server. And they persist in RAM even
when you disconnect from CAS. So when you reconnect
to a CAS session, those Global Scope
tables are already still loaded in memory for
you to continue processing in your next session. Remember, again, that all of
this talk of data persisting is persisting in memory. Data that you saved to an
offline storage location, of course, persists. It can't be processed
until you load it back into the memory space. So what happens to a
Session Scope CASLIB when a CAS session ends? And the answer is that, it's not
available next time you log in, because Session Scope
libraries are visible only to the person
who created it and only for the duration
of the CAS session in which it was created. Now, when you connect
the CAS, there is always an active CASLIB. And there can only be one CASLIB
lib active at any one time. Generally, if you connect
CAS and don't specify, CAS will automatically make your
CASUSER library active for you at the start of
your CAS session. It's a place for you to
store your own private files. And you're the only one that has
access to the CASUSER library. At the invocation of
CAS, or at any time while you're in a CAS
session, you can always change the active CASLIB. So if you're starting a CAS
session with a CAS Statement, you can use the CASLIB= option
to specify a different CASLIB than CASUSER. Conversely, if the session
is already in process and you want to
switch active CASLIBs, you can use a CASLIB
statement to do that. This kind of breaks out
into three different types of CASLIBs. Your personal CASLIB,
which is global. Your personal CASLIB will
always be available to you when you log back in,
restart a CAS session. If you don't specify
a CASLIB when you start your cast session,
it will automatically be the active CASLIB. And there are a
predefined CASLIBs. And these are CASLIBs that
have been defined by someone with administrative privileges. These are generally
global in scope. And they have access
controls on them to determine which users
can tap into these CASLIBs during their CAS sessions. And these are generally created
for popular data sources that many different people would
require access to frequently. And finally, there are
manually added CASLIBs. Now, you have to have
permission to do this. Not everybody will
have permission to manually add CASLIBs. But your user in this
class has this permission. And when you add
a CASLIB manually, you can create that
as a session CASLIB, which will only exist for
the duration of your session. Or you can create it as a global
CASLIB, which will persist after your session disconnects. Now, the administrator
can control who has access to that CASLIB. And we usually use these
for ad hoc data access. So we frequently do
have authorization to manually add CASLIBs. So now that I have a CASLIB
active in my cast session, how can I see what's in there
in the SAS 9 side of things? We could connect
to the CAS session with a light LIBNAME statement. We provide the LIBREF that we
want to use on the SAS side to refer to that data. In this case, we're
using CASUSER. And we specify the CAS
engine to connect to that. Now, you can stop
there, put a semicolon and the active
CASLIB about the time will become connected
to your CASUSER LIBREF that you see here. Or you can go a little
further and explicitly specify which CASLIB you would like
that LIBNAME to connect to. When you've
connected to a CASLIB with a line LIBNAME
statement, only the tables that have already been loaded
into memory are visible. Remember, we said, that in
order to process data in CAS, the data has to be loaded
into the memory space. So it is quite possible that
you'll have several offline pieces of data stored
in this CASLIB, but when you connect
with the CASLIB, you'll see nothing
because none of them have yet been lifted
in the memory. Now rather than
assign a CASLIB one at a time with a
LIBNAME statement, you can use a CASLIB
statement to assign LIBREFS for all of the CASLIBs
available to you in your CAS session. You do this with the
_ALL_ naming convention and the keyword ASSIGN. If you don't want them all, but
you have three or four of them that you'd like to connect at
once, rather than running three or four separate
LIBNAME statements, you can just say, CASLIB, and
list the CAS library names that you're interested in
and the assigned keyword. This makes all those
CAS libraries available using LIBREFS that are the
same as the CASLIB name. And they'll be visible
in your SAS session. So let's take CAS
for a test spin. In SAS Studio, I'll
find the demo program. In this program, we're
setting a path macro variable that kind of points to the basis
for all of the courses here. And then we have a
LIBNAME statement that assigns a LIBREF to
point to our SAS files that are located in
the MySAS folder here. And then we're going to start up
a CAS session named MySession. We're really using all
the default settings. And we're going to
list the CASLIBs. Let's go ahead
and run that code. And you'll notice in the SAS log
that we are connected to CAS, that our CASUSER library
is now the active CASLIB. And then a list
of all the CASLIBs that are available
to me in my session. Even though CASUSER
is the default, there are several other
CASLIBs available out there. So we have connected the
CAS, and we've actually assigned a LIBNAME name
to our base SAS data. So here in my
LIBNAMEs I see my SAS. But I don't see anything
about CAS, no CAS libraries. Well, early on we said that
we could use a line LIBNAME statement to assign a LIBREF of
our choosing to a CAS library. Because CASUSER is my
personal library in CAS, I'm going to call the
LIBREF CASUSER also. I'm going to go ahead and
assign that LIBREF here. And now my libraries
show the CASUSER library, but there are no data files
in memory at the moment. Now, if I wanted to assign
LIBREFS to all of my CASLIBs all at once, I could use that
CASLIB statement instead. Remember, CASLIB_ALL_ASSIGN. And when I do
this, you'll notice that there's a lot of CASLIBs
that did get assigned, but there's a couple
of notes in here that said, basically,
the CASLIB name was not valid for a use as a LIBREF. You remember that
there's a restriction on LIBREFS that say we can only
have eight characters in them. And the CASLIB model
performance data was too long to make
a LIBREF out of. And so the SAS LIBREF was
not automatically assigned. We'd have to do that one by
hand and give it some shortcut name of our own if we'd like. But it did expose all of
these other CASLIBs to me. You'll notice that most of
those don't have any data loaded in them right now either. Well, I wonder if there's
anything in my CASLIB CASUSER. You'll notice that there's
no content showing. Well, I can list the files
in the outline storage with a list files
and in PROC CASUTIL. PROC CASUTILs are
very useful procedure for fiddling around with
stuff in the CAS environment. We can list the
files and the tables. The tables would be
things that are in memory. I'm going to go ahead
and run that list. And we've noticed
that it gives us some great information
on where the path is to the actual physical
folder that is associated with the CASUSER library. And then we have a boat
load of files out there. But when we did the list for
the tables, nothing showed up. So no tables are
loaded in memory. Let's have a look at
the Public library. We can change the CASLIB to
Public by restarting our CAS session with a new
CASLIB= specification. And then we can list the things
that are in the Public CASLIB. You'll notice that that does
set Public to my active CASLIB. Now, this one says
local equals no. That means that this
is a global library. We could just as easily list
the files and tables in Public by changing this,
if we were curious. We found that there's no
tables available in Public, but there are a couple
of offline files that are available out there
we could load into memory space if we wanted. Now we can change the
CASLIB without having to reset the CAS session
with a CASLIB statement. So we're going to make a
new CASLIB called MyPath. And this is an ad hoc CASLIB
that is local in scope. This is the data folder. We look at our folders
out here in data, there's a folder called MyCAS. We're going to point
our CASLIB at that and use that as our offline
storage for the CASLIB lab called MyPath. And when we create a CASLIB
with a CASLIB statement, it automatically becomes
the active CASLIB for my CAS session. With this SAS
function, GETSESSOPT, you can see that the
active CASLIB is MyPath. Now what happens
when I clear it? It disappears for
my CAS session, and you'll notice that the
CAS session never leaves you without an active CASLIB. Soon as we cleared the CASLIB,
MyPath, CAS automatically set CASUSER, our personal CAS
library, as the active CASLIB again. And that's our first dive
into using CAS libraries. Hi there. It's Mark Jordan again. In lesson two of
Programming for SAS Viya, we'll take a look
at how to load data into CASLIB from the
offline storage associated with the CASLIB. And also, loading data into the
cast lib from files and data sets available to us
in the SAS Session. So now that we've learned a
little bit about accessing data in CASLIBs, you'll probably
want to load some files end to the memory space and
do some analysis in CAS. Now, before we can do any work
with tables in CAS, of course, we have to have the
table in memory. And as you can
see in this CASLIB there are no in-memory
tables loaded. But we have plenty of
files in the server that we can use to
load tables from. The server-side files reside
in that offline storage space that belongs to the CASLIB. And they could take
many different forms. In order to do analysis
with any of those files, we need to load
them into memory. Once the files are in memory,
we no longer call them files, we call them tables. And tables are
what CAS work with. Just like CASLIBs can have
a Session Scope or a Global Scope, so your tables
can also have a Session Scope or a Global Scope. The default is Session Scope. What this means is that
you'll see the promoter equals no when you look at
the contents of the table. And that table will
be visible only to you or whoever it was that created
that table in their CAS And when they
disconnect from CAS, that table is automatically
offloaded from memory. What if I need that table
and subsequent sessions? What if others in my group
need to share the data with me? In that case, I'll
need a global table. A global table can be promoted
to global as you load it, or it can be promoted from an
existing Session Scope table. You'll know that a
table is global in scope because in the contents, it
will show promoted equals yes. This table will be visible
across CAS sessions. You can disconnect and
reconnect, and still see the table in memory. It will also be visible
to any other users to whom the administrator
has granted permission. And it will have all
the necessary controls like row level locking in
order to allow that table to be used simultaneously
by more than one user. And as we said before,
these tables are not dropped from memory when
you disconnect from CAS, but remain in memory so
that next time you connect, there they are. So just as a quick
refresher check. If we create a Session Scope
table, is it only for us and are we the only
ones that can see it? And will it go away when
I disconnect from CAS? The answer to that
is, yes, of course. Session Scope tables are
only visible to the user who created it. And they're automatically
unloaded from memory as soon as the CAS
session disconnects. Global tables, of course,
are visible to multiple users and will persist in memory
even if you disconnect your CAS session. Now maybe the data you want to
analyze doesn't live in CAS. It's not in the CAS offline
storage space or the data sources, but we have access
to it in our SAS Session. You can think of the SAS
Session here as the client that's talking to CAS. And so we refer to these types
of files as client-side files. We can load these
client-side files into CAS tables with the PROC
CASUTIL LOAD DATA statement with a data equals and
your SAS data set name. Hi there. In this demonstration,
we're going to show you how to set up your
autoexec file in SAS Studio so that your CAS session
and CAS libraries automatically reconnect
themselves every time you start SAS. Then we're going to load
a client-side file, a SAS data set into CAS
and investigate the contents of the in-memory
table that we've created. In the previous
demo, we showed you how to use the path macro
variable, LIBNAME statements, and CAS and CASLIB
statements to assign LIBREFS, point to local SAS
data, and to CAS tables. We're going to copy this
code, and in the Options we're going to
choose Autoexec File. We're just going to
paste that code in there, and then we're going to save it. Now to prove this
works, I'm going to go ahead and log out of SAS
Studio, and then sign back in. Now you'll notice that
all of my CAS libraries that showed up in
the previous demo were also showing up here, as
well as the MySAS library that contains the SAS data
sets for this class. So let's load a SAS
data set into CAS. Now remember, when
we do this, it's going to take data from
an existing SAS library and it's going to load
it up into the CAS memory space as Employees. So that means that
when we do this, it should show up in the
cache user library here. The load went well. And now, we can actually
see the CASUSER table that has the employee information. And it matches the
information that was in our original Employees
table here in the SAS Session. Now before, when we listed the
tables in our CASUSER library, we got nothing from the
listing, if you remember. But when we run the
list of the tables now, we can see that the
Employee table is indeed loaded up in CAS. And you can see the path to the
folder associated with that CAS library. Now you can use PROC
CONTENTS on that table too, because CASUSER is also
a LIBREF here on the SAS side. So we've looked at it from the
CAS side using PROC CASUTIL. Take a quick look at the same
information on the SAS side using PROC CONTENTS. And you'll see that
we could access that data set either
way from the CAS side or from the SAS side. Now, let's compare PROC
UTIL and PROC CONTENTS. Notice that the information
that we get from PROC CASUTIL is fairly extensive. And PROC CONTENTS generates
the standard information. Now, we're not going to
turn off the CAS session, but we can remove
the employee table from our CAS
library memory space with a DROP TABLE command. You see that it is
no longer listed. And just for good measure, we
can reload the Employees table into the CAS library and
manipulate that table with a standard BASE
SAS TOOL PROC DELETE. Notice that either
way we do it, using PROC CASUTIL or
standard SAS procedures, we can manipulate,
access, and describe data on a CAS server from
our SAS Session. Now, perhaps you want to load
files from the client side that are not SAS data sets. And we could do that too. We use the load file equal
version of the load statement. Don't forget that
in the file name, you'll need to use a fully
qualified path to the file in your local SAS session. And then you're
going to be required to specify the name of the table
that you're going to create. What about those
external files that are stored in the
CASLIB data source? Things like Excel
spreadsheets and text files. Well, you'll also
use a LOAD statement, but this one uses the
CASDATA equal option, indicating that the file
can be found in the CASLIB's offline storage space. It's not the active CASLIB. You can specify in
CASLIB to let us know which CASLIB's
offline storage has the file you're interested in. And, again, if you want to
load that table into a memory space in a different CASLIB,
you can specify the OUT CASLIB equals option. You can specify a table name
that's not the same as the data source name, if you wish. And, of course, you can
promote that table on load. By default, the name
of the data source will be the same as
the name of the table in the in-memory space. But the tables are not
replaced by default. So if there's a table
existing already, you must specify replace
or you'll get an error. In a previous demo,
we took a SAS data set from our SAS Session and
loaded it up as a CAS table. This is kind of referred
to as loading a client side table into CAS. In this demo, we're
going to take a look at loading a file from the
CAS's local offline storage up into memory for processing. The file we're going to
use as an Excel file. You're going to see how easy
it is to load of Excel file has a CAS table. And once we've
reassured ourselves that everything's well with that
table in our personal CASUSER library, we'll promote that
table into the public space so that others can
access the data too. The CASUSER CASLIB
offline storage space includes a Excel
file named sale.xlxs. We'll go ahead and
use PROC CASUTIL to load that file up
into CASUSER memory space as a table called SALESXLXS. Then we're going to take a
quick contents of that to see what the data looks like. And in our results, we can
see that the columns came out pretty much as we'd expect. Notice that all the
character columns have come out as VARCHAR. Nice space saving measure. And that this is a
local scoped table. Currently resides in
our CASUSER library. Having satisfied ourselves
that the data looks good, we would like to share this
data with the rest of the people in this CAS Server. Now there's a CASLIB
out there named Public that contains a bunch
of Global Scoped tables that remain in memory
for others to use. What we're going to do
is promote the table from our CASUSER library
to the Public CAS library using the same name. Now when we do this, the actual
table is moved, if you you. It will appear in Public and
no longer appear in CASUSER. Now, because I've already
run this demo before, that table already exists
in the Public library. And one of the attributes
of a global table is that you can't replace it. In order to replace the
table in the Public library, I'm going to have
to drop it first. Now you can see that the table
has been replaced with a table that we used to have
in our CASUSER CASLIB. And just to verify that
everything went as planned, we'll take a list of the
tables in both CASUSER and in the Public library. From the output, we can see
that in the Public CASLIB, the SALESXLXS table exists. But if we scroll back and
look at the CASUSER CASLIB, there are no longer any
tables listed there. So when we promote a table
from one CASLIB to another, the table is moved and then
promoted to a global table. Now in this class, you're
using a virtual lab. You'll be logging in a student. This user ID is a
data manager who has permission to
load data into CAS and set access
permissions for others. You may need to identify which
files are needed to be loaded, whether their client side or
server side, determine the CAS library or libraries
where the in-memory tables should be stored. Finally, you'll want to
take a look at table scope and see whether the tables
scope should be local or global. After you have manipulated
and processed the data, you may wish to say
the modified data off to the offline storage. And if you do that, you
may want to determine whether you want to save that
data in its original format or in SASHDAT that
format for rapid parallel loading when needed again. So after processing the tables,
we might want to save it. And we'll show you
how that's done. Now, there's a lot
of tables loaded up in the memory space for
this particular CASLIB. And we may want to
drop them from memory or want to know when
they are dropped from memory automatically. If you have the CAS Server
restarted, of course, all in-memory tables will
be dropped from memory. But you can also use PROC
CASUTIL with a DROPTABLE statement to deliberately
or explicitly remove tables from the memory space. Before you do, you may
want to save that table off into the data source
area for later reload. And an excellent format for
storing your data is SASHDAT. This is a format that allows
the data to be quickly loaded back into
memory in parallel just by reading the header. It's typically much, much
faster than transferring data from the client side
to the server side. Now, let's have a look at how
we might save a table that is in a memory space
in a CAS library out to the permanent
storage space of the CASLIB in SASHDAT format. Do this we're going to use a
code snippet to help us out. First, let's have a look
at the files that already exist in the CASLIBs
we've been dealing with, CASUSER and Public. And we can see that the CASUSER
library has the expected files, but there is no
SALESXLXS SASHDAT file. Now, you remember that
SASHDAT file was nice because it made for very, very
fast loading of large data sets. Now, SALES isn't a
particularly large data set, but it will do for
our demonstration. In the Public
memory space you'll see that the SALESXLXS
table exists. So we want to take this
table and start back as a SASHDAT file. So I'm going put my cursor
right here in the code window, and I'm going to find me a
code snippet that will help me save a table to the CASLIB. Now, if I Double-Click it, it
opens up in a separate window. But if I Right-Click
and choose Insert, it will insert code
into my current program. So the table name that we
want to save is SALESXLXS, and the CASLIB like
that it's is its Public. The CASLIB where I want to
store it really is my CASUSER. And we're going to call
it salesxls.sashdat. Now that file
should be available in my CASUSER library. And a quick run of PROC
CASUTIL should prove our point. And now we see the
salesxls.sashdat file is stored in our CASUSER library,
And then rather than loading from an Excel
spreadsheet, next time we can load it in parallel
using the SASHDAT file. Hi. Mark Jordan, again. And in Lesson Three of
Programming for SAS Viya, we'll take a look at our
traditional SAS code, DATA Step and SQL, and see how
we might have to modify that to run in SAS Viya. Now, if you're like me, you
have extensive experience programming and DATA
Step and Base SAS. You maybe wondering, have
my skills become obsolete? Does everything I know
go down the tubes? And that is a good
question, but the answer is that it's not true. You can use much
of the same syntax that you're very much
used to using in DATA Step and run that in CAS at
much faster speeds because of the parallel processing
capabilities we find in CAS. Now, the very thing
that makes CAS fast also changes a little bit
the behavior of programs that we write and run in CAS. So we'll need to be
aware of those changes and figure out how
we want to strategize to mitigate the
effects that we don't want, while keeping all those
high speed effects that we do. So to understand what
causes the difference, a DATA Step itself is a
single-threaded process. It reads the data sequentially
one row at a time. And the instructions that
you write in a DATA Step are really instructions on
what to do to each row of data. And so this concept of
sequential single-threaded processing shouldn't
come as a surprise. But in CAS, we have
more than one worker that can be executing
that same code at once. And so the data needs to be
distributed amongst the workers so that we can maximize
our throughput. And CAS and Viya take care of
all that stuff automatically for us. And this means
that our DATA Step will run now in multiple threads
instead of in a single thread. Now, the data is
partitioned out in blocks. And this means that
not every thread is going to receive exactly the
same number of rows of data. Another interesting
thing about it is that when you have
multiple people working on the same thing in parallel,
some will finish quicker than others. So that the order
that you get things back in from the processors is
not always the same every time you run the process. So we'll look at some
of these examples and how, in some cases,
the parallel processing is completely transparent to us. In other situations,
where we need to take some action
because of the way that threads produce
their results. Let's start with a
simple Base SAS DATA Step program using a single thread. And then we'll run the
same thing straight up in CAS and Viya
and check it out. So the DATA Step is used
to manage or manipulate tables in preparation
for further analysis, generally, with some type of
analytical procedure in SAS. We can modify the values that
already exist in columns. We can compute new columns,
and we can conditionally process and produce extra
rows or only the rows that we desire. And we can combine
tables in a DATA Step. Here's a relatively
simple DATA Step. It reads in a data set
called MYCUSTOMERS, and it writes out a data
set called DEPARTMENTS. All we're doing is checking
the value of CONTINENT to set the appropriate
value for the DEPARTMENT. If both the output table and
input table in a DATA Step are CAS tables,
then this DATA Step will run in CAS automatically
without any further need for me to do anything to my code. Now let's take a look at how
to take an existing DATA Step program and modify it so
that it can run in CAS. The big thing that we're
going to have to remember is the DATA step has to read
from a CAS in-memory table before it could run in CAS. And ideally, any
data that it writes would also write
out to a CAS table. So first, let's take a
look at the data program and see how it works. The data program
creates an output data set called WORKDEPARTMENTS
by reading in the customer data set from our SAS library. The select group sets
the value of DEPARTMENT depending on the CONTINENT
in the particular record. And the output
data set will only contain the variables CITY,
CONTINENT, and DEPARTMENT. And writing the values of
THREAD ID and N to the log will let us know how many rows
were processed by this thread as we run a traditional DATA
step just to see how it works. You can see that
the THREAD ID was 1. In other words, there's
only one thread associated with traditional a DATA Step. That's true. Now we've processed
almost 200,000 records. Now we'd like to run
this program in CAS. The first thing we're
going to have to do is to make sure that the
data set that we want to read is in CAS. You can't run it in CAS
unless the data you're reading is in CAS. So I'm going to go ahead and
load the CUSTOMER data set up into my CASUSER library. And I'm going to promote
this to a global table. This means that every
time I connect to CAS, my CUSTOMER table
will already be ready for me to use in my
personal CAS library. And that seems to
have run quite well. So the next thing
that I need to do is change my DATA Step to
read from the CAS library. Now we have assigned a LIBREF
to the CASUSER library called CASUSER. And my table should be
available there as MYCUSTOMERS. So I'm going to change
this to say CASUSER, and I'm going to change
this to say MYCUSTOMERS. Now we also said, in
order to get this baby to run in CAS all the
way, would be good if we wrote the result
out to a CAS library. So I'm going to write
this out to CASUSER. And that should be
all the modifications I need to get this
thing to run in CAS. Wow. A lot of processing
going on there. You can see that because we
were reading from a CAS table, writing to a CAS table, the
entire process ran in CAS. It's noted in the log. And that there were 16
threads involved in processing this data. But what about the output? Has anything changed? Well, in order to make this
work so we can compare them side by side, I've made a copy of
the original DATA Step program over here, and I've executed it. We can see that the output
data contains Leinster and its first row of data. And if I rerun this
code, you'll note that the output data
always comes out in a consistent order. Leinster's always the
first city listed. When I ran the
identical code in CAS, the first city was Hanover. If I rerun that code,
the order of the rows that is coming out of
the DATA Step in CAS is different each
time due to the fact that rows returned as
soon as a thread has finished processing them. And our log confesses that
we've got 16 threads working on this problem. In the original DATA Step,
we had only one thread working on the problem. This is just an artifact of
multi-threaded processing. All right, so we've
seen in some cases, that the DATA Step
runs sequentially in a single-threaded
Base SAS environment and SAS Viya
environment produced pretty much the same results. Maybe in a different order,
but the same results. Let's talk about
conditions that may cause the actual calculated
results to be different. Let's take a look
at an example that uses a SUM statement to create
an accumulator variable. When we as a SUM
statement in a DATA Step, the expression on
the right-hand side is added to the accumulator
variable on the left. And the value of that variable
is retained automatically, so it doesn't get set back to
missing every time the DATA Step iterates. Now here, we're using the
SUM statement conditionally. So the expression is only added
to the accumulator variable if the condition was true. This DATA Step creates
the accumulator variable, NAcustomers. It counts the number of
customers from North America. Both of the input and
output data sets are in SAS, so this code is going
to process sequentially in a single thread on
the SAS Compute Server. When we're all
finished, only one row will be output that has the
total value for the number of North American customers. If we change this so that the
table we're reading and writing from reside in CAS, then
this will run in CAS, and so it will also execute
in multiple threads. You can think of many copies
of this program, right? Each running independently
in multiple threads on their own blocks of the data. And when you look
at the result, you can see that each
copy of the DATA Step ran on its own thread. And when the thread had
completed processing its data, it wrote a record out. So rather than
having one record, having the total for the
entire data set MYCUSTOMERS, we have 16 different
rows of data output. One for each thread that
was doing the processing. So this is an artifact of
threaded processing general. SAS programs are
processed and CAS. The data and a program
is distributed across all of the nodes in the system. And after each node
finishes up its processing, it returns the results to
the system for writing out to the output table. Now when we divide and
conquer the world like this, the work process is
much, much faster. But in this
particular case, it's fast but we don't have our
final result that we were really looking for. So what do we do? Well, if you think about it,
if this was a giant table, you have only 16 rows now
of pre-summarized data that you can further
summarize to get the answer. So it might be simple as
writing another DATA Step and summarizing those results. But how can we make
sure that DATA Step doesn't run in multiple threads
and exacerbate the problem? We want a way of making the
code run single-threaded. So with a single equal
yes data set option we can force the
processing to occur on a single thread even in CAS. Now there are still some DATA
Step statements that cannot run in CAS. If connecting to CAS from
a SAS 9.4 installation, then those DATA Step programs
will just run on SAS 9.4. If you're operating in a
pure SAS Viya environment, SAS Viya includes the
SAS Compute Server. SAS Compute Server can
run your code just as if it was running and SAS 94. Things that run in the Compute
Server run single threaded. Even though your DATA Step, in
this case, is running on Viya, it's not running in CAS, it's
running on the Compute Server. And it will run in
a single thread. Previously, we've seen
how running a DATA Step in multiple
threads of CAS affects the flow order of the output. Now we're going to
turn our attention to a different problem. When you're summarizing an
entire data set with a SUM statement and you run that
process in 16 threads, you get 16 individual
answers for subgroups, not the one single
answer for the total data set as you had anticipated. We'll see this in
action, and then we'll see a couple of ways we
can address this problem. Now if this DATA Step was
running in the SAS Compute Server, it would all
run in a single thread. And as a result, we would
only get one output row when the entire data
set had been processed. But if this DATA Step
is running in CAS, it will run in multiple threads. And each thread will detect
when it runs out of data and produce an
individual row of output. If we look at the
output data, we can see that rather than one row
with a sum for the entire data set, we have a series of 16
rows produced by each thread as the code was
executing in CAS. So what could we do to
resolve this problem? Well, when you're running
a DATA Step in CAS, you have the ability to
add an option to the DATA statement that specifies you
require the process to run in a single thread. Now, if we did this,
that would also have produced a single output. But there would have been
a impact on performance. So let's take a quick look at
the total cost of running both of these DATA Steps
and compare it to what it would cost to
run the entire process in a single thread all at once. So in the two DATA Step process. We used to 0.3 seconds
and another 0.01 seconds for a total of 0.04 seconds
to do the processing. By running the entire process
in a single DATA Step, we did get the answer
without writing too much additional code, but
it required 0.09 seconds to run. That's almost
twice as much time. And this is not a
very large data set. So you can see that
there are benefits to running distributed in CAS. And even if it does
require a second DATA Step, frequently, the overall savings
are well worth the effort. For a quick comparison, we'll
look at the 2-step process. A single row of output and
the single-threaded version of the same process. And notice that the answer
returned as the same, but the total time
required to acquire the answer in the two DATA
Step process was much less. You could use a DATA Step
to process data in groups or to merge two
data sets together based on the contents of
one or more variables. But the data has
to be sorted first. If the data is
sorted, you can do first-out last-out
processing to identify the first and last
observation in each group. This can allow you to
do grouped aggregations. Sorting's a pretty resource
intensive thing to do. And especially, as
your gets large, this can bog down the process. In any case, after the
sorting is completed, the data is processed one
row at a time single-threaded from the data sets in question. Now, in CAS, the
default behavior as to load your data
in a distributed fashion across the nodes. And this has no
regard for the values in any of the rows themselves. The DATA Step is executed
in different threads. So if you're thinking
about by groups, there could be members of those
groups in each of those nodes. This is the default processing
if there's no BY statement in your SAS DATA Step. In CAS, if you do have a BY
statement in the DATA Step, there is no need
to presort data. BY in CAS will ensure that
all of the rows associated with a particular group
wind up on a single node so that the group processing
that you do will be accurate. This still allows the data
to be distributed in groups across the multiple nodes. So parallel
processing can happen, speeding up your throughput. It is no longer
necessary for you to think about sorting your
data before doing things like by group processing with
first-out last-out or merges. If you apply a format to a
variable in the by group, the ordering is in accordance
with the formatted value of that by group variable. Now, the DATA Step
with the BY statement can execute in each
thread with the assurance that all of the values
for a particular group will be on an individual node. And as each thread
finishes processing, it will return its
result. In this case, the yellow DATA Step node had
the fewest rows to process, so it returned
its results first. And then thread 4,
and then 3, and then thread to 2, and so on. So the order that the
values are returned may be different each time, but
the by group processing values are completely accurate. Previously, we've
seen the effects on a DATA Step which was
doing an accumulation for the entire data set. Let's take a look at
a similar process, but when we're
accumulating in by groups. Now being before we can
use it BY group processing in the traditional
data program, we're going to need to the data. Because we want to do
processing by continent, first, we'll sort the
customer data by continent. And then we'll take
the sorted data in with a BY statement
in the data program and do first-out
last-out processing. This basically will allow
us to make a single output row by the summarized values for
each unique value of CONTINENT in our original data. In order to be able to take
a quick look at this data when it's done, I'm going to
add a quick PROC PRINT step. Run our process and see
what the output looks like. As you can see here, we've
got one row of output for each unique value
in the CONTINENT column. And because the data
was nicely sorted and the process runs
single-threaded, the observations came out of
there in alphabetical order by continent. What would be different if
we ran this program in CAS? Well, if you remember,
in CAS, we're not going to need to
preserve the data. So I won't need
the PROC SORT step. I'm going to copy
the DATA Step out. And I'm going to change
this so that it runs in CAS. Now you remember,
we already have a copy of the CUSTOMERS table
up and my CASUSER library. I'll just modify the code
to read from that table. And then because I want the
whole process to run in CAS, I'm going to write the results
out to a CAS table too. You might remember
us talking earlier that said that when we do
by group processing in CAS, we don't have to
preserve the data. The CAS engine will take
care of that for us. All I have to do
is run the program. And you'll notice that we got
the same five rows of output, but they're not necessarily
in the same order. We can switch back
and forth and see that the values are the same,
but that the order of the rows is not the same. And once again,
this is a function of parallel processing. Hi there. Mark Jordan, again. In the fourth and final lesson
Programming in SAS Viya, we'll take a peek at some of
our favorite Base SAS procedures like, PROC TRANSPOSE,
PROC REPORT, PROC MEANS. And see how they have
been unable to do much of their computations in CAS. And then we'll take a look at
a new procedure, PROC MDSUMMARY that is built specifically
for creating summary statistics in the environment. Beginning with SAS 9 M5,
many Base SAS procedures were enhanced to
process data and CAS. The goal is to have CAS do
the heavy lifting, summarizing and analyzing larger data. And then downloading to
the SAS Compute Server only the much smaller summarized
data for post-processing. And post-processing
might include things like computing
additional statistics or displaying the results via
the output delivery system. The documentation for
each of the procedures will identify what procedural
functions are supported in CAS. If you're programmed using
one of these procedures includes a statement that
is not supported on CAS, then the data for
that in-memory table must be transferred down
to the SAS Compute Server. And then that procedure will
run on the Compute Server using the transferred data. A lot of the procedures that
do data management things like, read table metadata,
PROC CONTENTS, and PROC DATASETS, procedures like,
PROC COPY, which make copies of the data, and PROC DELETE,
which can delete tables can all run in CAS. But the key to getting
this to run in CAS is to make sure that you're
working with CAS tables through a CAS engine LIBREF. So you can see here
that for PROC CONTENTS, the data equal
should have a CASLIB as the LIBREF for the contents
procedure to work on CAS. PROC COPY, the N
equal, and so on. Many procedures that
manipulate data like, PROC APPEND, DS2, FEDSQL, PROC
TRANSPOSE are also CAS enabled. And they can take advantage
of that massively parallel multi-threaded environment. Now PROC APPEND allows you to
append one table to another. The BASE= table gets the data
in the data equal table appended to it. But because the
CASLIBNAME engine does not support updating a
data table in place, you can't really upend to
an existing table in CAS. You can use PROC APPEND to
create a whole new table if you specify the name of the BASE=
CAS library table as a table that doesn't exist yet. Then the DATA= table is copied
up in place with a new name assigned to it. As SAS programmers,
we're used to using PROC APPEND to append more
data to an existing data table. We're going to see
that in CAS, pending to a table is not permitted. But we can use a DATA Step with
an APPEND equals yes option to produce a very
similar effect. First, we'll create a couple of
smaller subsets to work with. Now, this data program
breaks up information from the CARS data set into
output data sets, CARS1 and 2. One continuing only Honda,
and the other only Acura cars. I could see that one
has 17 observations, and the other has seven. Now, in order to append the
data from the second table to the first, in a traditional
Base SAS environment, we would just run PROC APPEND. And when we were done,
the seven observations would have been added to
the existing observations in the first data set, for
a total of 24 observations in the resulting data. Now, let's load those data
sets right up into CAS. And we'll try the same
processes in PROC APPEND to append the data in the
second table to the first. We quickly see that update
access is not supported for tables in memory in CAS. And as a result,
PROC APPEND fails. Now one thing you can
do with PROC APPEND is you can make a
copy pretty easily. If you append an existing
table to a table that doesn't exist yet,
then PROC APPEND merely makes a copy of the
original table for you. Let's go ahead and
take a look at that. I'm going to use PROC APPEND
the copy a data set up into CAS, and then compare it
to the original table. Because there was not
yet an existing table, PROC APPEND was
successful in creating a new table named CARS. How are we going to append
the CARS2 data set to CARS1? Using a traditional DATA Step,
writing to a CASUSER CARS1 table, we're going to set CARS2. But we have this APPEND equals
yes data set option applied. And the end result is
that the final caslib table has the 24 observations
in it that we wished. And we have achieved
the exact same results we would have been able to
achieve with PROC APPEND. Now, I love SQL. But our old friend PROC SQL has
not been enabled to run in CAS. What this means is that PROC SQL
procedure can read CAS tables through a CAS-styled LIBREF. But the processing
won't happen on CAS. Instead, that CAS table has to
be brought down to the Compute Server and processed there. And because CAS tables
can be pretty large, it's likely to have a
pretty dismal effect on your performance. So what's an SWL lover to do? Well, you can execute
SQL queries in CAS, but you must use FEDSQL instead. PROC FEDSQL is a SAS proprietary
implementation written to the ANSI SQL-3 standard. In SAS Viya 3.3 and above,
besides the traditional SAS double-precision floating point
and fixed width character data, FEDSQL can also support
int64, int32, varchar. So it has some advantages. Some of the other
differences include PROC FEDSQL is much more ANSI
compliant than PROC SQL was. And so things that
are very SAS-oriented, like the mnemonics we use
instead of operator symbols, are SASsy in nature and they
don't work in PROC FEDSQL. Now the calculated
keyword, the remeerging of calculated statistics,
all of those things are not ANSI standard, and so
they don't work in PROC FEDSQL. Other than his lack of support
for non ANSI extensions, you'll find that FEDSQL has a
really similar syntax to PROC SQL. However, not all the
statements and functionality available in FEDSQL
can run in CAS. The CREATE TABLE statement
that creates a table from the results of a query
expression is supported. And the DROP TABLE
statement is available along with a SELECT statement. So most of the things
you use most often will execute just fine in CAS. Some of the things
that don't work in CAS include, sat operators like
UNION, correlated subqueries, which are terrible for
performance anyway. And a CREATE TABLE statement
with an ORDER BY clause. Dictionary queries don't
work on the CAS server, but you can query the
dictionary using FEDSQL through a CAS-style
LIBNAME statement. And the VIEW statement doesn't
work in CAS to make views. Now, if you want
to tell if you're PROC FEDSQL code is
actually running in CAS, you can turn on the MSGLEVEL
equal I system option. The default value, N,
only prints and notes. I will add additional
informational messages, which will include information
on whether your code ran in CAS or not. We've learned that PROC SQL
code runs single threaded and will not run in CAS. We've also heard about
this new PROC FEDSQL. That FEDSQL code is
more ANSI compliant and does run fully distributed
in the CAS environment. So let's see what it takes
to convert a simple PROC SQL query to running CAS
using PROC FEDSQL. So here's a simple
PROC SQL query that takes distinct combinations
of customer name and city and produces a report
ordered by the city value. We're only going to
include those where the continent is in Africa. Let's have a look at
the original program and how it runs. Now we can see that the values
do indeed appear to be unique. Now, how can we get this
process to run in CAS? We would just take the query
and convert it to PROC FEDSQL. Now we'll would change
the SQL to FEDSQL. Of course, I'm going to add the
SESREF=MYSEESIONOPTION to point that SQL out to the CAS session
in which we wish to process. And then I'm going to run
the query just as it is. I'm using the option
message double equals I to ask SAS to tell
me whether the PROC FEDSQL ran successfully in CAS or not. Oh, no. When I ran my query directly as
FEDSQL, I ran into a problem. And the problem is that the
EQ mnemonic operator is not recognized by FEDSQL. Remember we said that FedSQL
is very ANSI compliant. And the EQ, LT, GT type
mnemonics are not ANSI at all. They are specifically
valid only in SAS. But let's go back and change the
FEDSQl code just a little bit to use the equal sign
instead of the operator. And that really should
resolve our issue. And here we can see that the
process did indeed run in CAS. And that the
results produced are identical to those
produced by PROC SQL. So keeping your rescue all
code ANSI compliant as much as possible will make
conversions to FEDSQL much simpler in the future. Now, several rather
Base SAS procedures have been enabled
to process in CAS. One of these is PROC TRANSPOSE. Running PROC TRANSPOSE and
CAS had several advantages over processing on the
SAS Compute platform. It reduces network traffic
and it processes much faster. If your input table and output
tables are in the CAS server, then usually the transpose
is performed entirely in CAS. Two other procedures
that are CAS enabled are PRODUCT MEANS
and PROC SUMMARY. When the input table
is a CAS table, a lot of PROC MEANS
processing can be formed right on the CAS server. If the results are directed back
to CAS using OUTPUT statement, sometimes, there are
some intermediate steps that need to be calculated
on the SAS Compute Server. So some of this aggregate
data may be passed back to SAS for processing before being
written out to the CAS table. On the plus side, if you
are using a BY statement and your PROC MEANS
or PROC SUMMARY, you won't have to
sort of the data before you use the BY
statement any longer. There are a few
things that you should avoid when processing in CAS. OBS=, FIRSTOBS, and RENAME= data
set options are not supported CAS. And they will prevent
processing there. Not all statements for a
PROC MEANS and PROC SUMMARY or supported than CAS. If you use only the
supported statements, then processing should
happen completely in CAS. If the statement is
not supported in CAS, the data has to be pulled to
the SAS Compute Server in order for processing to occur. You will get the
correct results, but performance may suffer. In addition to some
unsupported statements, there are some
unsupported statistics. PROC MEANS and PROC SUMMARY
provide a bunch of statistics that are not supported in CAS. This means that if you use
those statistics in your code, processing of the
intermediate aggregates must be performed on
the SAS Compute Server in order to get you the
answers that you need. If you want all of the
processing performed in CAS, you can invoke one
of these actions directly using PROC CAS or one
of the many other CAS clients and languages available to you. Another way of getting
summary statistics with all of the
processing in CAS is to use PROC MDSUMMARY instead
of PROC MEANS or PROC SUMMARY. The MDSUMMARY procedure
is designed specifically to work on CAS tables. It could compute a wide variety
of basic descriptive stats for variables across all the
observations or within groups. The MDSUMMARY procedure
can only read CAS tables. And it doesn't display output,
it produces an output table. You can use that table
for further processing as you would use any other table
produced by a SAS procedure. Now let's take a look at how
to use that new MDSUMMARY procedure to produce summary
statistics for tables in memory in CAS. At first glance, the
syntax for PROC MDSUMMARY is so close to PROC
SUMMARY, that we are tempted to just change
the word SUMMARY to MDSUMMARY on our code. But if we do this, we'll
quickly find that MDSUMMARY requires OUT equals. It does not produce a report,
but only an OUT equal data set. So we'll add the
output statement with an OUT equal specification
to write the results to CASUSER user PRICE COSTS. And then we'll run a
quick little FEDSQL query to show us the results. You'll remember that we needed
a SESREF to get FEDSQL out to run on the CAS server. A quick look at results
shows us that PROC MDSUMMARY has produced a wide
variety of statistics, retail price, and cost. Starting in SAS 9.4M5, both
PROC REPORT and PROC TABULATE summarization can be executed
on the CAS server if your input data, of course,
originates in CAS. Like PROC MEANS
and PROC SUMMARY, PROC REPORT and PROC TABULATE
both have a long list of statistics that they can
analyze to produce a report. Like PROC MEANS and SUMMARY,
not all of these statistics are supported in CAS. If you use all of this
sort of statistics here, the analysis can be
performed in CAS. If you choose other
statistics, then it may have to be pulled onto
the SAS Compute platform in order to complete
your report. If your PROC REPORT
program contains variables of usage display or order,
then the data probably will have to be pulled back
to the SAS Compute platform to finish creating your output. If you intend to run your
PROC REPORT code in CAS, you should avoid using
DISPLAY or ORDER variables. Well, I've lectured
through all the lectures. And now, all that's left is for
you to complete the practices. While you're doing
your practices, I'd like to keep a
couple of things in mind. Number one, you should do
the practices in order. Sometimes, the output
of one of the practices is required as input for
the next practice exercise. In addition, there's a couple
of typos on the PDF course notes for this course, particularly,
in the exercises. On Practice 3.1, it tells you
to open the starter program sv03e01.sas. But it should be sv03p-- as in papa-- 01.sas. Now, the same issue with
the second exercises. Program name sv03e02,
should be sc03p02. So just make sure that you
open the right program file. All of the programs on the
computer image should be fine. And I hope you enjoy
these exercises and they help set in your
mind all the things you've learned during
the last few hours about programming in SAS Viya.