Alteryx Tutorial for Beginners - 2+ Hours of Alteryx Training & Alteryx Demo

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

subscribe and click the bell icon to turn on notifications we've made the files the instructor uses in this tutorial available for free just click the link below in the video details to get these hello everyone and welcome to our course introduction to altrics altrix is a self-service analytics tool which can be used as a data preparation data cleansing etl and powerful analytics tool with its drag and drop functionality alteryx pushes the idea of the citizen data scientists my name is andrew pearson and i am the founder and managing director of intelligentsia limited a software consulting company that has been partnered with altrics for several years and we have worked with several of their clients who are located mostly in asia we have several altric certified consultants working for us and we have watched alteryx grow from its roots as a spatial data company into one of the most powerful data and analytics tools available in the market in early 2020 alteryx was recognized as a leader in gartner's 2020 magic quadrant for data science and machine learning platforms gartner specifically recognized altrics as the leader with the highest ability to execute on its vision altrics altrix's easy to use solutions address the needs of a wide range of problem solvers from citizen data scientists to highly trained business analysts the altrix platform offers customers both a code-free and code-friendly approach hundreds of automation building blocks and the ability to integrate with enterprise applications and robotic process automation or rpa to automate machine learning pipelines and business processes we have included the altrix files for all of the exercises in these videos and these contain all the necessary data and workflows in the first set we introduce students to the altrix essentials prepping and getting started with designer delving into data types formatting data and working through a basic workflow in the second section we discussed data preparation including filtering blending and grouping this section also includes three videos describing altrix's basic data functions section 3 is all about data parsing and manipulation with sections on how to parse and dynamically rename data how to use the vlookup and append feature and how to work with multiple fields we will include a full altrix workflow example here and end this section with the tips and tricks to help students navigate through alteryx as skillfully as a pro section 4 delves into altrix's data visualization features altryx will gladly admit that its visualization capabilities aren't up there with click tableau or power bi but it is very functional and is useful to build quick reports that might not look as pretty as a tableau dashboard but can certainly produce reports with much more impressive underlying data complexity and structure section 5 details how to use analytic apps and how to build macros which can help automate processes and speed up the workflows section 6 shows students how to analyze spatial data including creating and understanding the data measuring area and distance and comparing spatial objects the exercises in this section utilize an interesting marketing data set from japanese retailers altex is a powerful analytics tool and section 7 finishes off the course with an investigation into some of the most useful analytical models available in altrics including case centroid clustering k-nearest neighbor market basket analysis logistic regression linear regression and several tree-based models the student will get step-by-step instructions on how these analytics functions work which builds upon all of the students previous work in these classes we did use altx version 2020.3 and .4 alteryx is forward compatible but not backward compatible so it will be important to have the same version so i'm very excited to have you in the course and i will see you in the first module hello in this course we are going to discuss altric's essentials so what is altrix well altrix is an end-to-end data analytics platform that allows users to prep blend and analyze data in a drag and drop user interface but how does this differ from programs that you might already be using such as excel and several other spreadsheet programs well adding data sources to excel requires manual copy paste and repeat tasks this is also true for data cleansing automation of this repetitive work is possible but you have to learn some programming language coding in altrics you can add repeatable processes by dragging and dropping tools into your workflow this workflow can then be automated or scheduled to run on a regular basis which saves a lot of time excel is also limited to just one million rows so anything greater than that would have to be put into a separate file while in alteryx a large amount of data can be utilized and it has 80 different types of needed data connectors as well as an assortment of apis so let's take a look at the platform the white blank space we have in the middle is the workflow canvas this is where we put our tools or our processes on the small tab above it is the workflow name an asterisk above it means the workflow has unsaved changes to close one or more workflows click the x or right click then select close close all but this or close all on the right upper corner we have the search button here we can type in keywords to look for a particular tool it also looks for content community posts connect assets and gallery items that are related to your keywords on the upper left we have several menus file for saving opening and exporting your workflow edit for actions such as copying paste undo and redo view for showing and hiding the different windows options for user settings and configuration help for tutorials sample workflows and hyperlinks to the altrix community and checking for new updates on the platform lastly is the language we can specify the languages used on the platform altrics is currently available in these eight languages under the menu is the tool palette which consists of all the tools categorized into color coded and shaped categories we can then secure this by clicking on the plus button select your preferred preset or select the tool category that you will use then hit ok to apply the window on the left is the configuration window it displays the configuration for either a selected tool a connection or the workflow itself if you do not select anything else below we have the results window it shows us the state of the data as the workflow runs and it displays any messages warnings or errors that can occur it also gives us links to the output file that is created then we have the interface designer this allows us to customize an app or macro interface this will be further explained in part of a later course lastly is an overview window which is a snapshot of our full workflow you can use this to pan around the canvas if you have a big workflow we're in the highlighted rectangular area is the one currently in view on the canvas to run your workflow you can click on the run button on the upper right portion of the canvas or press control plus r on your keyboard or click on options then run workflow this will start the flow of the date from the input tools to the tools you have configured on the canvas do take note that this will not change anything on your original file input unless we overwrite it using an output data tool tools won't do anything to your data unless you configure them first let's look at our configuration window each tool has a different configuration if not configured properly they will display a red exclamation mark on the tool icon to let you know that it needs further action in this window the first icon with a screw contains the tool configuration that we have discussed while the next one with the pointed arrow is the navigation which shows us which tools are connected to the selected tool that we have we can also hide or show their connections by selecting or deselecting the wireless checkbox the tab icon shows us the annotation settings for the tool here we can name the tool and type an annotation or caption of each tool the annotation can also be shown based on canvas setting always show or never show the last icon in the window leads us to the help page of altrics it has the complete documentation regarding the selected tool once you are done with the workflow you can save it by navigating to the file menu then hitting save your workflow will be saved in your designated file path as a dot yx md file which saves the whole workflow including the tools used you can also export your workflow by clicking on options then export workflow this will save your file as a yxzp which is an altrix package that contains your workflow yxmd as well as any assets used such as the input files that you have in your workflow this will be useful if you are sharing your work to others in your organization we've covered a lot very briefly in this video now we're going to dive into the detail of these various areas let's head on to the different data types that alteryx supports alteryx processes values differently depending on their data type and here are the five main types of data first are the strings to put it simply these are plain text such as names places and descriptions alphanumerics can also contain symbols next are the numerics they can contain numerical values that can range from negative numbers to positive numbers and also exponential numbers third is date and time from the word itself it can only contain the date or the time then we have boolean it can contain either one for true or zero for false this data type is usually used for flagging the data for example a customer can either be subscribed as a member wherein the value is 1 or if not it's 0. lastly is spatial this data type contains objects such as blobs centroids and polygons this can also be an image or sound file the underscore string and the underscore w string are both variable length meaning that they can both adjust the length of the field to accommodate the entire string within the field the underscore w string will accept any character including white format and unicode characters it's also a great catch-all field type if you are not sure of the format of your incoming data we have several different numeric data types the smallest is byte which can only accommodate small hold numbers up to 255 or 8 bits in size next are the integers in int 16 int 32 or int 64 which all takes full numbers wherein their size is indicated on their names 16 32 64 bits respectively fixed decimals can accommodate a six decimal number this is useful when using currencies or measurements you can also format this data type's length and decimal point float and double are floating number formats that are dedicating a portion of their ones and zeros to placing a decimal point where necessary a float uses a decimal that can be placed in any position and is mainly used to save memory in large arrays of floating point numbers a double uses a decimal that can be placed in any position a double uses twice as many bits as a float and is generally used as the default type for decimal values alteryx automatically reads numeric data as double data as it can pretty much catch all scenarios with its huge maximum value and 15 significant digits precision the altrix date time has a specific format that may require you to do some additional data manipulation so that you can use date time functions on the formula tool the formats needed are as indicated on the table we will explain more about this type as we go through the data time topic in another video boolean is the simplest type of data it's either one or zero and it signifies that the data is either true or false it only takes up one byte of size per boolean record spatial objects can be a centoid a specific coordinate a shape or even a file they are mostly used for spatial analysis to compare distance between points how can we check for these data types in altrics when bringing in data a lot of users use the select tool to check data type and structure here in our example we have a dot yxdb file or an altrix data file that contains several types of data fields to add a select tool click on the preparation tool set then click on select tool and drag it to the canvas a dotted line will show that you can attach it to the nearby input data tool and it will become a straight line once you let go of the tool once done algex can now show you the list of available fields from the connected data through this configuration window this shows the data type for each column and its size we will show you the full functionality of this tool in a later video in creating an altrix workflow we have three main components first is our tools they represent each step in the process to add a tool to our workflow drag it into the tool palette and drop it to a blank space in our canvas clicking on a tool in the palette also shows a sort summary of the tool and a hyperlink to a workflow example this will be useful if you want to know more about the different functionalities of the tool and its configurations you can also add a tool by right clicking on a blank space on our canvas click insert select a tool category and click on the tool that you need each tool in the workflow has an anchor this serves as the inputs and outputs of data from each tool each tool has a different number of anchor we will use these anchors to create a connection between each tool allowing the data to flow from input data source to our preparation tool by clicking the input anchor of each tool you can view data coming into the tool by looking at the results window likewise to view the data coming out of the tool meaning it has already applied the step process indicated on the tool click on the output anchor let's try to create our workflow let's start by adding an input data to our canvas click on the in slash out tool category and drag the input tool to the canvas this tool will bring in a file or a database connection to our workflow to add a file click on the arrow on the configuration window this will open a data connections window that shows you the recent files you have used your saved data the file data sources and gallery both the file and data sources tab lists all the types of files and data connections supported by altrics to add a file you can click on select a file and navigate to a file in your data folder or drag and drop a file from your file explorer if you choose an excel file it will direct you to this window you can either select a sheet and or specify a range of cells that you need or you can import the sheet names only since we will use the data inside the sheet we selected the latter click on ok to apply the options section will be different for each file type unless you specify how the file will be read on our excel file we first have the record limit you can add a specific number of rows if you only want a sample of that data the file format the table or query in this case it contains our sheet name search subdirectories in only used if you are importing multiple files from a certain path output file name as file adds the name of your file or your whole file path to your data first row contains data is for files that do not contain the header and the indicated row where you want to start the data import from the preview box below contains the first 100 records which can be fewer for wider files to review the complete data let's add in a browse tool the data will be shown in the results window once we run the workflow on the results window you have a count of row records and the size of the data if you only wanted to see select fields you can show or hide it by selecting the field on the upper left of the window you can also view the contents of a cell on the cell viewer and show if it has white space you can also save this data as a new file by clicking on the save icon or copy it by clicking on the copy button the metadata type shows the type size and source of each field column in your data set a color data quality bar displays at the top of each column of data in the results window red not okay the column contains values with leading or trailing white space yellow null the column contains no values gray empty the column contains strings with no values and green ok the column contains values without leading or trailing white spaces if you wanted to check the data for a specific value type it in the search bar on the upper right and hit enter all rows associated with that value on one of their fields will be shown you can also sort and filter individual columns to do so click on the three dots on the field name and click on either filter or sort these filter and sort changes will not be a permanent step on your workflow unless you apply the tools to the canvas this will add in the filter or sort tool with the appropriate configurations that you have applied using the browse tool also allows you to look at the data profile of each column this is available in the browse configuration window if you haven't selected any fields from the browse window it will show a tabular list of top occurring values of each column including the numbers of instances where it was available on the data selecting a specific field in the browse window will give you more insight on the column the information will vary depending on the data type of the column in our example we have a numeric field called zip the first chart contains the data quality summary of the field showing if a certain percentage of it has null empty or trailing white space the next chart has the value statistics of the field the bar chart has the grouped range of available data and shows which range has more recurring values since this is a numeric value it also shows us some simple statistical values such as the minimum maximum and average value of the data lastly is a list of top values let's take a look at a string field and see its profile difference we still have the same data quality chart but we have a bit of a difference on the statistical chart instead this makes use of the length of the strings and shows the shortest and longest value that it has and also has the first numeric value and the last it also shows a value chart with the most frequent value another way of adding in data to altrics is via text input it allows you to create a new stream of data without using a file or a database you can manually type in data from the configuration box for example i can add in a simple lookup for state code and state name by typing the headers on the top row and adding in the data below alteryx also automatically converts tabulated data as text input if you are using a table from a website you can copy it directly from the web and paste it to altrix it can also come from the browse window or an excel file you can also import data from other files and data connections if you wanted to add the modules runtime to your workflow like if you're using it as headers for your report you can add a date time now input tool this will add a feel with the date time format as to when the workflow was run like we previously mentioned these will not affect anything in your data sources the data that altx reads in will be untouched unless you use an output data tool the output data tool allows us to write the results of our workflow to a new file type or a new data source to configure the tool select a file path then type in a file name and select a file type and click save if you are saving it to an excel file it will prompt you to specify a sheet name if you want to replace an existing file you can write the same file type with the same file name and location to output the file run the workflow the results window will show a link to the file that was created the options section also lets you configure max records to be input on the file select the file format and select output options in this excel file you can append the new data to the sheet delete the current data on the sheet and add your new output data which overwrites the whole sheet or creates a new one the rest of the selections are used if you are saving to a sql database or to different data connections if you set an automated name file for your output file you can select the field that contains the file name or the file path by clicking on the check box selecting the action to be applied and mapping the field name formatting your data before you start your analysis is necessary to create a more efficient workflow for example a fields data type can affect which functions are available for it calculations require numeric data types and will generate an error if not formatting your data beforehand also optimizes your workflow if you reduce or remove the unnecessary columns lastly having the correct names for your fields will reduce improper usage instead of relying on confusing field names the first tool we will discuss is the select tool the select tool allows you to make changes to a column's metadata such as name data type size and description let's try to format our sample data set to make it more applicable for future analysis our sample data is the customer information from a telecommunications company the file is in a csv format so upon dragging a select tool to the canvas you will notice that all of its fields are set as v-string data types let's run the workflow first to display the sample data in the results window looking at the tenure column we can change this to an int 16 format since it's a whole number and only has up to two digits to change a data type click the drop down on the select tool and select int 16. another field that we can change is the senior data field to look at the unique values of the field let's add in a browse tool and run the workflow from the profile of the senior citizens field there are only two unique values 0 and 1. this means that the field can be categorized as a boolean configure the select tools drop down on the field and select boolean we can also change the size of the field let's look at the gender column on our data set the column only contains male or female with 6 being the longest length in size let's change the data size field to 6 on the select tool click on the size and enter the number you can observe that the changes we did for the columns show a pink tint in the background this marks every change that you've done to the field and helps you track back on the configurations that you've done previously in looking at the data set you might also decide to change the field name to make it more intuitive to do so select the field on the select tools configuration window and type in the new name on the renamed column paperless billing and monthly charges if you have columns that are not needed for your analysis or process they should be removed early on to optimize the speed of running the workflow you can do so by simply deselecting the fields from the configuration window you can also reorder the columns for ease of use or some strategic grouping this is essential specifically if you are working with data sets that have more than 20 fields the columns on top of the select tool will be placed on the leftmost side of the browser while those on the bottom of the list will be on the right most to arrange the columns you can click on the specific column and click on the up arrow to push it up to the top and the down arrow to move it down to the bottom if you wanted to move several columns highlight the group and click on the arrows you can also move it to the top most or the bottom most part by clicking on option and selecting move to top or move to bottom or by highlighting the field right click and drag to the new placement another thing you can do with the tool is sort the columns either according to their field name or their field type you can also quickly select or deselect all fields batch change their field type or add a prefix or suffix to their field name you can also revert any changes you've done by clicking on revert all to original type and size at the end of the list there is a last column called unknown this is not an actual column in your data but it serves as the foresight of the select tool if new columns are added in the future to automatically add the new columns you can keep it selected or deselect it if not you can save all of these configurations on the fields by saving it then loading it if you need it again on a new workflow with the same field template since we already talked about sorting your fields let's now look at sorting the data itself sorting data can help provide more meaningful insights such as understanding data value comparisons this time we will make use of the sort tool the sort tool arranges records in alphanumeric order based on the values of the specific data field we usually have three types of data to sort strings numbers and dates let's look at how each type will be sorted by the tool here we have a small data set on a text input it has customer information date and score one thing to keep in mind is to ensure that your data types are in their correct format before you sort them drag a sort tool from the preparation palette to the canvas this tool has two configurations the column that you want to sort and the order in which you want to sort them let's try a string field first select region as the column name and ascending as the order then run the workflow string type data assorted from a to z based on their first letter descending order will sort it on the reverse or z to a now let's try a numeric field select customer id and ascending as the order the numeric data was sorted from smallest to the largest value the descending order will be the reverse wherein the largest is at the top and the smallest at the bottom now let's look at how dates can be sorted select the join date and set ascending as the order the oldest date appears on the top and the most recent on the bottom you can also base your sort according to level the first column on the top list will be the first level to look at with arrange it you can select the line and click on the up or down button you can also click on the minus button to remove it the sort for string types are based on ascii 2 values by default this means that the strings with punctuations as the first character will appear on top followed by uppercase to lowercase if you ticked the use dictionary order the values will be sorted as to how they appear in the dictionary of the selected language regardless of letter case or symbols those with punctuations will now be moved to the bottom we have now arranged our columns and sorted our data but what if our data is still a mess the names capitalization are important there is a lot of trailing white space and extra symbols and a bunch of null values the data cleansing tool can help us fix that the data cleansing tool fixes common data quality issues using a variety of parameters in this example we have a list of stories and their addresses let's drag a data cleansing tool from the preparation palette to help us cleanse its data first selection can remove null data from either the rows or columns this means that it will remove every row that has only null values for all fields or removes column that does not have any values do keep in mind that it does not remove rows or columns that have empty string values next we are given a list of fields where we can select specific columns to cleanse you can select and deselect it by clicking on the checkbox all fields that were selected will be cleansed according to the configuration we set the replace null function helps us replace the null values on the field with either a blank or a zero blanks for string fields and zeros for numbers next we can remove unwanted characters or symbols leading and trailing white space is selected by default tabs line breaks and duplicate white space replaces any occurrences of white space with a single space including line endings tabs multi spaces and other consecutive white space all white space removes any occurrences of white space letters removes all letters this includes letters outside of the latin alphabet numbers removes all numbers punctuation removes the following characters we can also set the capitalization for the string data types uppercase lowercase or title case let's try to apply what we have learned from the past lessons about creating a workflow on a new workflow let's add an input tool to import our data set click to in slash out and drag an input data tool to the canvas select file tab click on select file and navigate to the data set the data we will be using is a movies list data set stored in an altrix database there's not much to configure within the tool so let's head on to the exploration of the data first step we need to do is understanding what we need to clean by looking at the data and its attributes let's drag a select tool and a browse tool to the canvas and run the workflow from the select tool we can observe that all fields are set to 999 and as v strings we will have to minimize the size to optimize our workflow and also correct our data types for future analysis one step we could do is to remove the unnecessary fields deselect the fields you don't need in the configuration window here we are removing the adult flag end year number of votes and the writers names another thing we can do is to minimize the size allocation of each field depending on their data let's look at the data profile on the browse tool to help us decide the correct data type and size the first field's data is already clean and unique which is good as it's our primary key let's just minimize it to size 10 since its maximum length is 10. it is also better to rename this to title id to make the field name more intuitive next let's look at title type the data is already clean so we only need to set its size on the configuration type 20 on the size next we have the primary title and original title both fields have the same data so we can remove one and set the size to 300 to accommodate long length titles we can also set this as a vw string to accept more types of letters from other languages then deselect the primary title on the select tool and rename the original to just title then we have the start year field this field is all numeric so we can change this to an int 32 data type since it only has four digits and all our whole numbers select int 32 on the drop down to change its data type notice the size is not changeable for this data type so it's a default to 4. next field we have is the runtime minutes it is also composed of whole numbers so let's set it as an it64 to accommodate long films select int 64 on the drop down on runtime minutes notice that same as earlier the size is set to the default 8 as an int 64. next we have the genre its max length is 32 and it contains multiple types of genre that are concatenated let's set it as size 40 to take in potential long values then we have the average rating it is also a numeric data set but this one has one decimal space let's set this as a fixed decimal on the select tool select fixed decimal as the data type then we are prompted to set a size in the next cell the number before the dot or the decimal will be the precision this is the length of the integer while those after it will be the number of decimal places that it has the maximum precision we can put in is 50 but we will just set this as a 3 since we are just using a small number for the average rating then set 1 as a decimal place do take note that this is the only numeric data type that has an adjustable length last field we have is the director field let's leave it on the default size since many movies have multiple directors let's run the workflow and then look at how this changed our data looking at the select tools output anchor we can see that some of the values from the numeric data have become null this is because the former data it had before it was processed by the select tool is a slash n which is a string with that being said all strings within the field will automatically be changed to null once we set the data type to a numeric one so how can we populate the null values we can actually change this to zero by using a data cleansing tool let's drag a data cleansing tool to the canvas and connect it to the output anchor of the select tool let's select the fields to cleanse title start year run time genres average rating and directors then let's enable the replace null for both strings and numeric then remove the white spaces from those fields let's check it by running the workflow as we can see the null values were now populated as zero this is necessary if we are going to run this as an analytical model because if the variables we are using have null values errors will be generated aside from those changes we also want to make sure that the director name is all in title case but only for this field to do so let's add in another data cleansing tool and just select the director field then modify case to title case and run the workflow to show the results once we have done that we can then sort our data we wanted to have the list of movies by year so let's drag in a sort tool from the preparation palette on the first level let's set start year as descending then let's set the title in the second level as ascending and run the workflow now we have the data from the most recent movies by alphanumeric titles and those with zero or nulls are pushed to the bottom finally we can save this data as a tableau data extract to be used for dashboards we do this by using an output data tool click on files tab select tableau data extract navigate to the path where you want to save it type in the file name and hit save run the workflow once again to generate your output the output will now be available on the specific path you want to use let's try out another example workflow this time we are working with multiple files as a data source if the files have the same template or fields we don't need to import them all to the workflow separately we only need to add one input data and then paste the file path and type asterisks.csv asterisks serve as a wildcard which will substitute all subsequent characters in this case we are bringing in all csv files from the path we can also set this as customer asterix dot csv which means it will take in all csv files that have customer as the first word on the file name let's also set the output file name as field to file name only to check the files included once you run the workflow you can see that the results window shows how many records it processed from the files and it will also display any warnings or errors it encounters during the reading process let's check on the browse tool to see if we successfully got all of our data from the six files click on the file name then look at the profile tab on the configuration window and navigate to the values chart we can see that all files from customers 1 to 6 were taken in by the input data tool and now that we have all of our customers in place let's try to sort them by salary add in a sort tool and then set salary to descending then let's save this data to an excel file drag in an output data tool from the tool palette connect it to the output anchor of the sort tool click on the drop down and select a file then navigate to the preferred output path type in a file name save type as microsoft excel specify a sheet name and hit ok then run the workflow to generate your output filtering is essential so you can focus on the specific aspects of your data it allows you to exclude the noise to help you get a clearer picture of what really matters you might want to just look at the sales of a specific year or you might want to separate out a certain category of your analysis in these cases the filter tool can help us the filter tool queries records by using an expression and splits data into two outputs true where the data meets the specified criteria and false where the data does not meet the specified criteria let's look at how we can use this tool select the filter tool from the preparation palette and drag it to the canvas the filter tool has one input anchor to receive the data and two output anchors to export the data the t or true anchor will output the data that meets the criteria you've specified while the f or false anchor is for the data that did not the data set we are going to use is a list of customer information including the total purchase and their date of membership on the configuration window of the select tool there are two types of filters the basic filter and the custom filter the basic filter is for simpler queries and is only based on one condition while the custom filter can handle complex queries and those that have more than one condition across multiple columns let's try the basic filter first make sure to select the basic filter radio button this is made up of three parts first is the column or field where you wanted to apply a condition next is the operator and third the value of the condition that will be evaluated let's filter on all female customers on the drop down select gender set the operator as equals and type in female as the value run the workflow once done the t anchor outputs the female customers while the f anchor has the male or other values the operators of the filter will change depending on the data type of the column we select now let's try filtering a numeric field on a new filter tool select total purchase and set the condition as greater than 500. numerical conditions have the functionality to filter by greater than or less than the indicated value compared to the strings which are based on alphabetical number then let's filter the date of membership by range and set the start date as 2019 january 1st and end date as 2020 october 7th for date or date time values such as this we can set the condition based on range or a fixed date value or even a dynamic one such as today tomorrow or yesterday to create a filter with multiple conditions we can use the custom filter select the radio button to enable the custom filters expression editor any basic filters created in the filter tool are also replicated in the custom filter so the first condition that salary is greater than 5000 is already present in the expression editor let's add in another condition to the box click on the variables button and select the married column type in the condition value and operator equals 0 since this is a boolean field there is a small red wavy line underneath the start of the married variable this signifies that there is still something wrong on our expression to be able to properly evaluate the two conditions we need a boolean operator in between them to insert an operator click on the functions button and select operators altrix has two boolean keywords that we could use for multiple conditions and for satisfying both conditions or for satisfying at least one condition let's select boolean and keyword and run the workflow the customers on the true anchor were all single that have more than five thousand in salary those that did not meet those conditions will be on the false anchor removing duplicate values are crucial in your analysis duplicate values can damage the accuracy of your data and lead you to a wrong conclusion you might also need to remove duplicate rows from your data set based on a value of a field or a combination of them because of errors in data entry or data collection the unique tool distinguishes whether a data record is unique or a duplicate by grouping on one or more specified fields then sorting on those fields let's drag a unique tool from the preparation palette to the canvas the tool has one input anchor and two output anchors u for the unique records found and d for the duplicates let's try to remove the duplicates in this sample data set the data set we will use is a list of user behavior from the pc gaming hub called steam in the unique tool the only thing we need to configure is the list of fields on which we will base the deduplication click on the user id and then run the workflow from the u anchor we can see that there are 12 393 unique users in this behavior list the altrix designer processes the data from top to bottom and it will classify a value as unique if it has not encountered it before those values which were previously found will now be marked as duplicates let's try adding in another column on the sort tool click on game and run the workflow when multiple fields are selected the value for both columns are combined and then compared on the data set different combinations of values are treated as unique values if values differ in any column that row is treated as a unique value for example in this data set where we run the unique tool for both user id and game the lines 1 and 2 are treated as duplicates because they have the same user id in game altryx will keep line 1 and mark line 2 as a duplicate on the other hand line 3 will still be marked as a unique value because it has a different game compared to line 1. all columns in the data set can be selected for deduplication as well this can help you identify which lines are exact duplicates in the select tool click on select all then run the workflow the u anchor retains the first instance of the unique line while those in the d anchor will have the duplicates in this case we have 710 exact duplicates from our data set of 200 000 records when cleaning your data you might find that you need to remove extra headers that were included on your file or remove empty rows that were on the bottom of the data set sampling data can help us easily remove those extra lines sampling data is also helpful when you are creating a machine learning model and want to separate a testing subset of the data let's focus now on the sample tool and how it can be configured the sample tool limits the data stream to a specific number percentage or random set of rows in addition the sample tool applies the selected configuration to the columns selected to group by in this sample data set we are working with the list of customers within a u.s state we have three extra headers with the file that we want to remove first drag a unique tool from the preparation palette the configuration will show us six different methods for sampling data all of these can be used with the group by function optionally as compared to the two tools that we have discussed earlier the sample tool only has one input anchor and one output anchor meaning the rest of the data that was not included on the sample will be removed from the data stream to remove the extra headers select skip first and rows since we have the extra rows on the first three rows of the file and add the number three on the end text box to specify that we are going to skip the first three rows all six available options reference the value n n can be specified by entering an integer in the end box after running the workflow the first three extra rows have now been removed from the data set now let's try to get the top five customers who spend from each city we will need to add a sort tool before using the sample tool since we are basing on the value of the field of total spend drag a sort tool from the canvas and configure it to spend by descending order then add a sample tool configure it to first and rows set the value to five then group it by city now run the workflow from the browse tool we now have the top five spending customers of each city because we have selected the group by function alteryx returns the top five results for each unique value on the city without the grouping it would have only shown the first five lines in this data set we can also create a randomized sample from the data using this tool on a new sample tool select one in n chance to include in each row then type 5 as the end value after that run the workflow the data set is now reduced to 22. in this setting each line is individually evaluated and is given a one in n chance to be included we have used five so it's a one in five chance or twenty percent probability of being included do take note that this is a random sample so every time you run the workflow there might be a different set of rows on the output in creating your analytic workflow you might be required to blend your data from different data sources to get the information that you need to join data sources successfully you need to have a designated relationship between data sets once that is established a row of data in one input can be associated to a row of data in another input in doing so we will be using the join tool in altrix the join tool combines two inputs based on common fields between the two tables you can also join two data streams based on record position in this example we will use two sample data sets of a casino the first are the demographics containing a unique user id and other demographic information while the other is their game information that is aggregated to a daily level this includes the user id date and their total stakes winning and bets per day let's add in a join tool from the join palette to blend the two data sources the join tool has two input anchors left indicated by the letter l and right indicated by the letter r connect the demographics input to the left anchor and then connect the user daily aggregation to the right anchor it also has three output anchors marked with l j and r in the configuration window we have two ways of associating data first is by record position second is by specific field we will discuss how to join the data by a specific field first select the radio button of the join by specific field from the phrase itself we will need to have a specific field or column on both data sets that match each other these are usually the primary and secondary keys used in your tables and data sources this key should be present on both the data sources in our example we have the user id on both data sets do take note that this key you will be using is required to be a string specify the user id as a matching field on the left and right inputs of the configuration tool the column name on the right input will automatically populate if it has the same field name we also have the option to add more columns as matching keys to make the join more restrictive this means that every time an id from the left matches with that of the right it will horizontally align the data in the output we have a small venn diagram in this tool to help us understand what the output will eventually look like the left output will be data from the left input that did not have any match when we used the keywords same as with the right output anchor this will only output data from the right data set that doesn't have any matches the j or join output anchor outputs the join data meaning these are the rows that have matching keys from both the left and right input below the diagram is an embedded select window it works the same way as that of the usual select tools configuration window same with the select tool you can deselect rename and sort the fields by how you want the output to look this is useful to remove the duplicates in the field name during the join the duplicate field name is highlighted and has the prefix right underscore let's deselect the duplicates to remove them from the output data stream add a browse on each of the output anchors a shortcut for this by clicking the tool where you want to connect the browse then press ctrl and shift and b this will add a browse tool for each output anchor then run the workflow we can see that there was data on both left and right anchors that did not match on the left output we have 444 users that do not have daily aggregations while on the right output we have 64 789 daily records that did not have a matching user on the join output we have one million six hundred and seventy five four hundred and seven records that matched meaning we now have its daily aggregation data total blended with the customer demographic now let's try joining by position in this example we have two small data sets the first of which contains the name customer id gender and join date while the second one has the name and their first purchase connect the first data set to the left input anchor of the join tool and the second to the right input anchor then on the join tools configuration window select the join by record position let's also deselect the duplicate names on the embedded select tool deselect the first and last name from the right add browse tools on the output then run the workflow on the j output anchor we can observe that the join tool merge the left and the right data sets horizontally this is because joining based on position assumes that line one on the left is to be matched with line one of the right data set if you are absolutely sure that the positions of the rows of both of your data sets are correct then you can use this method to quickly produce the results that you need aside from using the join tool we can also stack various data streams on top of one another by using a union tool in comparison to a join tool which blends data horizontally the union tool stacks incoming data stream vertically by appending the data source on top of one another based on either a matching column positions or upon manual alignment this method can help you save time from having to work on each data set separately and allows you to create a more dynamic workflow we will make use of the union tool in this tutorial the union tool is used to combine two or more data sets into a single data stream in this example we are going to blend data from three different data files a csv an excel file and a tab delimited text file drag a union tool from the join palette into the canvas the tool has one input anchor and one output anchor however the union tools input anchor is different from those found on many other tools in designer its input anchor is made up of multiple arrows to indicate that this tool can accept multiple incoming data streams the first data stream connected to the tool will be the template that determines the output column names for the rest of the files you will add let's connect the csv file to the union tool the connection line has a number one on it to signify that this is the first data stream that will be entered into the tool you can specify the order later once we configure the union tool now let's connect the excel file and the text file you can also edit the name of the connections to easily identify the source by clicking the connection and typing in a name or number in the configuration box in the configuration window the union tool has a drop down which shows the three ways you can vertically align the data let's also try auto config by name first when using order configuration by name altx will combine or stack data based on the identical field names of the connected data sets so make sure to check the field names of each data set before using this method there is also a configurable message if any of the fields have a different field name on the configuration window under auto config by name there is another drop down that lets you choose if a mismatched field name is found it can either show an error a warning or an ignore an error stops the process and will show an error in the results window a warning will continue the process and will show an error and ignore will also continue the process and will not show any errors you can also set it to output all fields including the mismatched field name or just output the field names that are aligned on the data sets the next configuration is by position if you selected this method the values will be combined based on the arrangement or order of fields regardless of their field name in this case it is important that the order of the columns in the inputs match before entering the union tool you can still configure the properties when fields differ in this section the last method is to manually configure fields this is extremely useful if you find that the field names of all the data sets are not reliable for aligning the value this lets you manually configure which fields align by selecting the field and using the left and right arrows you can also automatically arrange the fields shown here by either position or name by selecting them from the drop down the last setting is for the order of the data in the output order enable the specific output order you can select a connection number and move it above or below the data set by using the arrow buttons do keep in mind this might cause slower performance upon running the workflow now add a browse tool and run the workflow the results window now shows the 3 000 records that we combined using the union tool from each data source files that have 1000 records the order of the data was based on the output order that we had set earlier when creating reports for your data you may only be asked to provide the total numbers instead of having to give all of the columns in your table in these circumstances you will only need to get an overview of your data set and summarize it to a certain level the summarize tool will help you quickly and easily do what you need the summarize tool performs various actions on your columns of data whether it's concatenating values merging reports or calculating averages this tool can help you reframe your data to make it more informative in this example we are going to try to get the total gross income of a supermarket the sample data set contains the sales from each customer per day including the specific branch where the items were purchased as well as the products category to be able to use the functions of a summarize tool properly we need to make sure that we have the correct data type for the fields using a select tool we change the gross income cogs unit price rating and other numeric value with decimal places to a double the quantity to an int 32 and the date to a date type once we have the correct data types let's drag a summarization tool from the transform palette and connect it to our select tool the first part of the configuration of the summarize tool has the list of fields available in the data stream it contains the field name and their data type you can minimize this list by just selecting a specific data type you want it to work with in the drop down window in the upper right corner in this window you can select the field you want to perform a function to by clicking one field or to select multiple fields hold the control button while doing so since we only need to get the total gross income this is the only field we need to highlight to perform a function or action in this field click on the drop-down in the center of the configuration window this contains the list of all possible actions you can apply to the selected field the functions that are not available for this data type will be in gray here we're going to select sum the selected action will be added to the list below this window shows all the actions to be performed by the summarize tool it has the field name the action which you can further change by clicking on this drop-down and the output field name of this function we can rename this by clicking on the value and typing a new name then run the workflow to see the results in the results window we can see that we only have one cell left in our data which is the grand total of gross income let's try using other data types in our summarize tool to remove an existing action click on the specific line on the list then click the minus button now let's add the branch field with action group by this group by action will combine database records with identical value in a specific field in a single record all of the resulting data from the records in a group are then summarized if no group by field is specified the entire file will be summarized like what we have seen from the grand total gross income then let's add the rating click on the rating field and then hover into the numerics action then select average run the workflow once done now we have the output of the average rating of each branch the branch column has been minimized to the three unique branch code do take note that the group by function is case sensitive so if ever a lower a was seen on the branch column it will add another row to this output and will not merge with the upper case a the average rating was also indicated per branch because we used a group by function let's try adding a date field on the summarize tool click on date and you can observe that it has less available actions compared to a numeric data type click on the max action then run the workflow now we have the last transaction date for each branch let's remove the last two actions and try concatenating a field on the field list select product line then on the actions drop down hover over strings then select concatenate this action adds multiple string values together in one cell below additional action properties for concatenate are available start for any character you wanted to appear at the beginning of the concatenated string separator the character that will appear in between values the default here is a comma an end for the character to appear at the end of the concatenated string both start and end is blank as default escape characters such as the following are also supported by this property run the workflow once you're done with the configuration now all the product lines brought from the branch were concatenated to one field as a last example for the summarize tool we will count the number of transactions per product line on the configuration window push up the product line level by clicking the arrow button then change the action to group by and remove the branch on the field list select invoice id then click on count as the action this will return the number of records in the group on the actions list rename the output of counted invoice id as transaction count then run the workflow the results window now shows the number of transactions for each product line upon inspecting your data set you might want to change the layout of your data having the data in a horizontal format with the values included on the header can be useful for applying machine learning models where you might want to get the percentage by each category on the other hand you might want to align the data in a vertical way where values are in one column so that you can easily filter it by category regardless of reason transpose and crosstab tools are flexible options to help you adjust the layout of your data we're going to start with the cross tab tool this tool is used to pivot the vertical data to a horizontal layout here in this data set we have date of sales segment and the sales amount before pivoting the data we need to visualize how we want the output to look here i want to retain the date vertically on the left and make the segments as my new headers with the sales amount as the value meaning for each date row i will have the total sales amount for that date in three columns the hardware software and service with that in mind we can now add a crosstab tool from the transform palette and connect it to our input data the first part of the configuration is to set up a group by field for your data values since we wanted to have the sales amount by date on each row we're going to use year month and date in our group by these columns will appear on the left side of our output now we are prompted to choose a column header in this example we are going to use the segment in doing so the designer will give the unique values of the segment field add create new field names according to the values do take note that if your values have spaces it will become an underscore to optimize the workflow next we need to choose the value as we can observe we can only select one field for this so you need your data values to be put in one column such as this one here we are selecting the sales amount as the value this field will now populate the values for each segment field of our output finally we need to set a method on how to aggregate it same with that of the summarize tool this part groups or totals the value on the column we selected for each column header grouped by the date or the fields we enabled grouped by on then run the workflow to check the results we now have the total sum of each sales amount separated by segment and group by each day same with what we have visualized the transpose tool does the opposite of the crosstab this tool pivots the orientation of the data from a horizontal to a vertical axis same as earlier we want to visualize how we want the data to look once we output it here our input data set is composed of the channel region with each product segment separated on the column and the total sales of each as the value what we wanted the output to look like is for it to retain the channel and the region on the left and have all of the segments in just one column and the value of total sales on a new column let's drag a transpose tool from the transform palette and connect it to our data source there are only two parts to configure on this tool the key columns and the data columns since there is no limit on how many fields you can use this tool is also extremely useful for extracting non-conforming data the first part is the key column this will be the columns or fields that you want to remain in the horizontal axis in this example we wanted the channel and region to be retained in the same format so select channel and region on the list the second part is the data columns the headers or field names will become a value in one column and its values the sales amount will be summed in another column the last thing we can configure is the prompt if we have a missing column in our data it will either throw an error a warning or just ignore the missing column let's set it to warning and run the workflow there were no warnings in our process which means that fortunately we did not have any missing columns in our data set on the output results we now have the segments in one column and the values in another as you can see alteryx automatically sets the fields as name and value so we still need to rename this on a select tool add a select tool after the transpose tool and rename name to product segment and value to total sales and now we have completed our pivoted data set creating an analytical workflow is not a straight out of the box procedure sometimes you may encounter data sets that have incomplete data and you still need to apply a couple of calculations to reach your desired output most of the time you need to manipulate your data to make it easier to read as well as to create better organization altrex features a bunch of functions that we can utilize to do this we can use the formula tool to help us understand some of these basic functions this tool can create or update columns using one or more expressions to perform a broad variety of operations in this example we will use a data set containing a list of books the description of these books and their rating we already converted the data types via the select tool let's add a formula tool from the preparation palette and connect it after the select tool the formula tool can be used to create new columns of data or modify value in an existing column in this drop down menu we can either select an existing column and modify its value or add a new column where we are prompted to type in a new column name or field name let's add a new column called file path in creating a new column altrics automatically sets it up as a vw string and allocates a big memory size to it we can choose a different data type from the list but for this one we will keep it as a vw string we can also modify the memory size by clicking the arrow up or down button to increase or decrease the value or double-click the number and type in a new one the white box we have in the center is the expression editor this is where we type in our functions or value for the new column we have added let's set a static data to this column type in a path on the expression editor and enclose it in double quotes to signify that it is a string in the data preview window it shows one cell preview of your data if it encounters any errors it will either not show any preview or display an error message additionally if there is something wrong with your expression altrick shows a red wavy line beneath it once done run the workflow on the output we can see that we have the same value throughout the new field called file path let's try adding a new column with a function on the same formula tool click on the plus icon to add a new line then add a new column called data underscore books and set it as a boolean type to add a function to the expression editor we can either type it in if you are already familiar with the syntax for the function or press control and space on your keyboard to show a list of functions lastly we can click on the functions icon to show us the list of available categories and their list of functions let's select string and use the contains function first part of the function called string is where we add our string data type where we want to search for a keyword to add the field you can either type it in or you can click on the constant icon then under existing columns select the field you will use here we are going to use the book underscore title field then we need to add a target keyword that we wanted to search for let's type in data you can also add a field here if it contains the keyword you want to look for as you can observe in the data preview it indicates that the first row of our data will output a false since it's a boolean this means that the first line of our data set does not have the word data in its title you can also add a zero after the book title to signify that you want to search for this word in the exact title case that it is written here meaning other books with all caps are all lowercase word data will be returned as false instead of true here we're going to keep it as is since the default is not case sensitive we can save this expression for later use by clicking on the save item type the file name then hit save you can then load this expression by clicking the load button then under saved expressions click on the function you just saved let's run the workflow to check our results the new column was added in the last row of our output anchor book titles that do not have the word data in them were marked as false while those that do contain the value have the word true this reminds us that the formula tool and the functions process data by each individual value meaning it will not always have the same exact value for each row unless we type it in as a static value like our first example which was the file path we can also move the orders of our formula on the configuration window select and hold the reorder icon and drag the expression to the top or the bottom the expressions on the top will be first to be processed and the last will obviously be on the bottom this is essential to keep in mind when you are working on several expressions that modify the same column you can also delete a function or expression by clicking on the delete button let's discuss other basic string functions that we can use on the functions icon of the formula tool the expressions or functions are categorized according to usage and data type that they are applicable to for example string functions will only be applicable to string fields numerics and date and time columns will not be able to use these functions unless you convert them or change their data type on a select tool let's add a new column and name it as description underscore word count then on the string function select count words this returns the count of words in the specified string field words are defined as characters separated by a space we only need to set one field here double click on string then type description the smart search will automatically give us a list of possible fields and functions from the words we have typed select description and run the workflow now we have the word count of the book description now let's try the function find string on the string functions select find string the string field to be used is the book title and the word the as the target then run the workflow this function searches for the occurrence of a particular string or the target within another string and returns the numeric position of its placement in the string in the output we can see a negative one and a zero zero means it found the word on the first part of the string whereas negative 1 means that it did not find it anywhere in the string next let's use the get word function on the formula tool add a new column called word extract then put the get word function on the expression editor the get word function has two parts the string where you need to get a specific word from and the end placement of the word you want to get let's set title as the string and three as n then run the workflow altrics extracted the fourth word from each of our book title it took the fourth word even though we set the number three because the word count starts with zero so zero is the first word one is the second third then is the fourth word and so on for those that have no output this means that the title did not reach the fourth word or that it only has three words or less then let's look at the left and right string functions left is for taking the first number of characters starting from the left while right is for taking the last number of characters from the end of the strings to extract it let's create a new column first name this as left underscore word then add the left function let's use description as the string and 10 as the length of string to get then run the workflow as the output file we can see that it took the first 10 strings including the space between characters write function would also do the same that it will start counting from the end of the string the length function simply shows you how long the string is add in the length function in a new column called description length then set description as the field to be measured the output then shows us the total number of characters available for that field the lowercase uppercase and title case function works in a similar way as the data cleansing tool function it sets the selected text to all uppercase all lowercase or each of the word in title case in creating the function you only need to specify the function and set which field to use in the new column let's try a replace function and modify the book type column in the formula tool select type as the column then add the replace function from the strings category we have three parts to set here string where we will apply the replace target or the word symbol or character we need to replace and the replacement for the target set type as the string then add underscore as the target then put a space as the replacement then run the workflow the replace function replaces all instances of the underscore into a space the reverse string function is true to its name as it reverses all the characters in the string so the characters from the beginning will be put on the last part and so on to configure this function we only need to specify which field it will be applied to the trim trim right trim left expressions are all similar in function but differ on which part they will trim trim string removes characters from both ends trim right removes characters from the right while trim left removes characters from the left the things we need to configure with this function are the field where we want to apply it and the character that we want to trim if we do not specify any characters to be trimmed altricks will default to trimming the white space before we discuss the last string function we will explain how a similar tool works this is the record id from the preparation tool set the record id assigns a unique identifier to each record in a data set let's add it before our formula tool in the configuration we can set a name for the field here we will leave it as a record id and set a starting value this tool will increment by 1 for each record we can also configure the type size and the position of where the record id will start once we run the workflow the record id field is automatically pushed to the top left or the first field in our data set the id started from 1 and incremented up to the last record which is 271. the record id tool is quite similar to the uuid create function in altrix except that uuid does not rely on incrementing to create a unique identifier for each line let's try to use it on a new column called book underscore id add the function from the strings list the function does not need further configuration so let's run the workflow as we can see from the output it outputs a unique string id for each line do take note that this is randomly set every time you run so there will be a random set every time you run the workflow applying numerical functions are also easy in altrics it supports the common mathematical operators that we usually use and even has expressions for trigonometric functions in this example we are using a data set from a supermarket that has sales separated by product category make sure to convert your data types to a numeric one before adding a function to properly execute a mathematical calculation here we have converted them to an int 64. let's connect a formula tool and try to get the total sales for each line add a new column called total sales and set it as an int 64 data type on the expressions editor let's put all of the product category fields by clicking on the constants icon and selecting all of the categories from fresh to delicatessen then let's add an addition operator in between each of them you can either type the symbol plus or go to the functions icon under operators select addition once done the preview on top shows us the top sales for the first row of our data keep in mind that alteryx does not require us to add any additional syntax like an equal or a semicolon at the beginning or at the end of the expression so we only need to put the calculation that will be applied for each new column upon running the workflow we now have the total sales of our last column let's try to use a multiple operators and put in a new column called grocery underscore sales underscore percentage set it as a double then on the expression add the field grocery divided by the new column we created called total sales then multiply it by a hundred we can also add parentheses before grocery and after the sales column to signify that we wanted to calculate the division first before multiplying it by a hundred then run the workflow we now have the percentage of sales that the grocery category has contributed to the total sales from the output we can see that there is a red triangle on top of each cell and this shows that the display value has been rounded to six decimal points for clarity but what if we wanted to round to a different multiple altx has four different functions that round numbers cl function and floor function which rounds to the nearest integer up and down respectively smart round which will automatically round to the nearest multiple which is dynamically decided by altx and round which lets you set a specific multiple to round to in this example let's use the round function on a new column that's also a double we will round the grocery sales percentage that we created earlier to the multiples of the nearest integer on the expression editor type in round and select the round function on the smart search list replace the x with the field we wanted to round up to in this case the grocery sales percentage then replace malt with the multiplier we wanted to round off to in this case 10 because we wanted to round it to either 20 or 30 or 40. if you wanted to round it just to the nearest integer you can put 2 and you can also round it to a decimal places one for one decimal place or point zero one for two decimal places and so on let's round this to the nearest tens and run the workflow now we have them rounded to the nearest tenths we can also add random integers to be used as ids in our data set we can also use the expression rand open parentheses close parenthesis which returns a random number generator greater than or equal to zero and less than one and rand int that returns a random integer between zero and the maximum value that we want do take note that this is a random number so every time you run a workflow there may be a different set of numbers alteric's basic functions and formulas part 2. alteryx also has several conditional functions that can help you in formatting and flagging your data using these conditional statements we can apply functions that we need for a specific category or a criteria let's try using it on this data set which contains a list of the google play store published applications it has the app name ratings and the category of the app in a new formula tool we are going to use a basic conditional function that will help us remark the applications in our data set if it's either a free app or a paid app drag a formula tool into the canvas in a new column called type add in a conditional function click on the functions icon select conditional as the category then click on the if then function this function has four parts or clauses if then else and end if each of these words create the framework for applying a logical change to your data if is for specifying a criteria or a condition this condition can either test a numeric value contain a specific keyword or satisfy a status in our example we wanted to check if the app was free or not so we will type in the condition price greater than zero to type in this condition double-click or highlight c colon on the if clause and then type in the field price greater than zero the next part of this function is the then clause this specifies which function or value to apply to the data if it meets the condition on the if clause so in our example if the price is greater than zero like one dollar or more we wanted to put it in a string value of paid this means that our app type is paid if it is indicated that it has a price greater than zero on our then clause double-click or highlight t then type in paid in all caps and enclosed in quotes to signify that this value is a string it is crucial to know that the data type of this value should match the field where we put our conditional function in this case our type field is in v w string format so it will accept the paid text value then we have the else clause this specifies what function or value to put if the data does not meet the conditions specified in the if clause in our example the lines were zero as the price will not be processed by the earlier then line so it should be processed under the else clause if the price is zero we will mark it with the text free on the else clause double-click or highlight f then replace it with the text free in all caps and enclosed in quotes the last clause is the end if which indicates that the conditional function is complete let's run the workflow to check the output of our conditional functions on the results we can see that the apps priced as 0 were marked as free while the others which were priced at 299 and 499 are marked as paid a conditional function to the if then function is the inline if statement but the difference between the two is that inline can only handle simple conditions and it has a shorter syntax let's try flagging the free and paid apps again but this time we will use the inline if function add a new column called type 2 on the formula tool then on the functions icon click on the conditional category and select the iif function the iif function has three parts to configure bool for the conditional criteria x for the value to be applied if the condition was satisfied or true and y is for the values that did not satisfy the condition or false type in price greater than zero on the condition with the text paid for the x and free for y then run the workflow the output of the inline function now has either free or paid based on the price value this is the same with that of the in then function we have created earlier in creating your workflow you might need to add several criteria on your data to satisfy more complex situations in this case a multi-conditional function is needed the if-then-else-if function is a multi-conditional function that looks similar to that of the if-then function but it has the else if clause that lets you add multiple criterias to your function as an example let's create a multi-conditional function for grouping the app rating into three categories one up to 2.9 will be marked as bad 3 up to 3.9 will be marked as average and 4 up to 5 will be marked as great on a new column called score underscore category add in the if else if function from the conditional functions list the clauses of this function are similar to the if-then function the first c is for the first condition wherein the t after then is the value if the first condition is met c2 is for the second condition with t2 as its value and f for the else value or the default condition if none of the prior conditions from c and c2 were not met we can add several else if conditions before the else clause if we are using multiple conditions so for our category we have our first condition wherein we need to get the rating 1 up to 2.9 so our condition will be rating less than 2.9 and rating greater than 1. to be able to process multiple conditions such as this we need to add a boolean operator in between them altrick says three boolean operators and or and not you can use either the keyword or the symbol in the expression the and operator needs two or more conditions like the rating in our example this will return true if all values are satisfied but it will return false if there is even one value that does not satisfy all of the conditions this fits our example wherein the rating should be both less than 2.9 and greater than or equal to 1. next is the or operator it also needs two or more conditions to be used this will return true if at least one of the conditions are met last is the not operator it negates the current condition and is used for only one condition this is useful if you are testing nulls or values to add an operator click on the functions icon and on the operators category click on boolean and keyword you can also type the keyword or symbol directly into your expression editor in between your conditions then we will add the value that it marks if the first condition was satisfied here we are going to add the word bad enclosed in quotes to mark the bad score category next is the second condition where we wanted to get a 3 up to 3.9 rating double click on c2 and replace it with rating is greater than or equal to 3 and rating is less than or equal to 3.9 we are still using the and condition because we wanted to capture all values between 3 and 3.9 the replace t2 value with the word average enclosed in quotes we still have a third condition to put so let's type in else if and space the third condition rating is greater than or equals to four and rating is less than or equal to five then great in quotes as the value to mark with it finally we have the else clause we will replace f with not available this will be the mark if the value does not satisfy any of the earlier conditions we have set then we make sure that there is an end if at the end to signify that this is the last part of the conditional expression it would be easier to check each instance of score ranking and their score category by using a summarization tool especially if you are working with a large data set connect a summarize tool to the formula tool on the fields select rating then group by as the action do the same for the score category then run the workflow now we can easily check if the score category was correct based on the grouped values of the rating the last conditional function that we have to set is the switch function it is similar to the multi-conditional statement that we can evaluate multiple criteria on the switch function runs a list of cases or conditions against a column of data and returns the corresponding value as an example let's create a new column called content underscore code this will mark the code for each content rating then from the functions icon select conditional and click on switch the switch value has several parts to configure the value has the field that you wanted to test default would have the default value if it did not meet any of the cases or conditions case 1 is the first condition result 1 will be the value if case 1 was satisfied and so on until you have completed your case list here we are setting content underscore rating as the field the text n a for the default meaning the content code is not available and type in the first case teen then a comma then type t meaning for each teen value it finds it will mark it as a t then type in each cases and results separated by a comma for the rest of your list this function is a time saver if you are only testing the values of the field and are marking them with a new value instead of having to type a lot of else if then functions you can use this instead once done we can also add another summarize tool to make checking easier add a new summarize tool after the formula tool then group by content rating and also group by score category now we have a separate content code for the different content ratings we can also test the null or empty values in our condition using the is empty open parentheses close parentheses and is null open parentheses close parentheses functions is empty open parentheses close parentheses tests if the field is null or equal to a set criteria while is null open parentheses close parentheses tests null values only let's check for null values in our field size on the formula tool select size then put an if-then function into the expressions editor for the condition type in is null open parentheses size close parenthesis then for the results set it to the text not available for the else or default result put in the field size this will mark the null values as not available and will retain the original value of the field size if it is not null there are also several test functions for testing the field data types is string open parentheses close parentheses for testing if the field is a string is number open parentheses close parentheses for testing if the field is numeric is integer open parenthesis close parentheses for testing if the field can be converted to an integer or number that does not have decimal places float or doubles are numeric but they are not an integer and is spatial open parentheses close parenthesis for testing if it's a spatial object or not you can also convert the data types in a formula tool if you want to use a string function in a numeric field the tostring function converts a numeric parameter or field to a string using num dec decimal places by default the period is used as the decimal separator it also has other optional parameters you can use like add thousand separator zero the default which formats the numeric string without a thousand separator adding the value one formats with the thousands separator by default the thousands separator is a comma is specified for a decimal separator in which case it is a period decimal separator the default decimal operator is period the value comma specifies the comma as a decimal separator if you wanted a different decimal separator let's try this function on this text input that has a double data type add a new formula tool and create a new column called amount in string that is a string data type with the size 254 then put the tostring function replace x with the field amount then one as the decimal place then run the workflow the value is converted to a string value with one decimal place we can also observe that those with two decimal places are truncated to one you can also do the reverse by converting a string field to a numeric one this is done by using the two number function it converts a string parameter to a number two number except strings that can be interpreted as scientific notation double precision it also has several optional parameters b ignore errors zero or false which is the default reports conversion error messages one or true ignores conversion errors keep nulls zero or false which is the default this converts non-numeric values including nulls to zero one or true converts non-numeric values to null decimal separator full stop or period default specifies the period as decimal separator comma specifies the comma as decimal separator let's try it out in the text input that has string values add a new formula tool and create a new column called payment set it as a double data type then add the two number function from the conversion category set amount as the field to convert then run the workflow we can see on the warning error that it has encountered an error in converting out of the lines this is because our first line has the dollar symbol let's use a replace function to remove it and convert the number properly inside the two number functions parentheses add the replace function set amount as the string target as the dollar symbol and empty as the replacement then run the workflow now we have properly converted the string to a numeric data type date time parsing and functions having the correct date and time format can be crucial when you are constructing your analysis or your reports you need to make sure you have the correct month day and year format to ensure you are using the data from the correct date value this can be tricky if your data sources are from excel or from text files as you can't always control their incoming format alteryx has several tools to help us format our date and time columns the first one we will discuss is the date time tool the date time tool is used to transform date dash time data to and from a variety of formats including both expression friendly and human readable formats you can also specify the language of your date dash time data let's try to use it in our example here which has a list of transaction dates and a date of payment that has different date formats this input is a csv so all of the fields are automatically set as a string let's add a date time tool from the parse tool set and connect it to our input data the first configuration of the tool is to set the format that you want to convert it to the first selection is converting a date time field to a string while the next one is converting a string field to a date time format we're going to select the string to date time format since our input is a string and we wanted to convert it to a date time data type next we have a drop down menu that sets which field needs to be converted to a date time let's select date of payment the next has a text box where we will indicate the new column name of the converted date column before adding a field name make sure that it is different from the first field that we have set so as to avoid duplicates in the column name type in payment date underscore 1 in the text box as the new column name the next configuration is a drop down where you can specify the date time language of your incoming string field depending on the language different formats will be available for the options set below let's select english as the language then we need to specify which of the list format matches our incoming string field date of payment looking at the data stream our date of payment has four digits for years first separated by a dash then followed by the month and the date so from the list we will select yyyy two uppercase m's mean the months are in two digits and dd two lowercase d's mean that the day of the month is in two digits as well once done run the workflow in the output we can see that nothing has changed in the format of the date but if we added a select tool we can see that the new column has been set as a date data type but do take note that altrick's designer can't process dates prior to january 1st 1400. let's try out the other date column called transaction date this format cannot be directly converted to a date data type via select tool because altrix follows the iso format iso format yyyy dash mm-dd h colon mm colon ss to represent dates and times if a date time value is not in this format the workflow will show warnings that it is not a valid date and will be null on the output so let's add a new date time tool after the select tool select the same format conversion that is string to date slash time format then select the transaction date as the string field type in transaction date underscore 1 as the file name then set the language to english the format of the transaction date has date first separated by a space then the month and the year this is not available in the current list so we have to use a custom format selecting custom format enables the text box below that will let you specify your own format of the incoming string field the altrix designer will then display an example based on the format you enter so for the day of the month we can either put a single lowercase d for digits with no leading zero or double lower case d for those with leading zeros for our example we are going to use a double lower case d then next is the separator in between the date specifiers here we are going to use a space altrix also supports other separators such as the backslash dash newline tab and other characters next is our month specifier we have a two digit month with leading zeros so we're going to use a double uppercase m again type the separator space in between them then for the year we can either put a double lower case y for two digit year or use the four lower case y for the full four year digit on our example we are going to use the four lowercase wise if we have created the format correctly there will be an equivalent example as seen below once we're done with the configurations we can now run the workflow you can check the output on the results window and look or filter for nulls if there are none then the date time conversion was successful designer also has several functions available for date time data types these can help you perform an action or a calculation on any date time values first we have the date time add this function adds a specific interval to a date time value add a formula tool to our canvas then create a new field called add month set it as a date data type on the functions icon navigate to the date time category and select date time add here we have three parameters to specify in this function dt for the date time field we are going to add from i for the positive or negative integer that will be either subtracted or added to the dt column this should not have any decimal places and u for the unit of time that you wanted to add to the dt field this can either be in years months days hours minutes or even seconds the unit should also be enclosed in quotes in our expression editor put the transaction date underscore 1 field as the dt then type 1 as the integer and type in months as the unit enclosed in quotes this will add 1 month or 30 days to our transaction date underscore 1. run the workflow and check the output now we have calculated the 30 days after the transaction date here january 1st 2019 added with one month and became february 1st 2019 and so on we can also calculate for the difference between date fields by using the date time diff function this function subtracts the second argument from the first and returns it as an integer difference the duration is returned as a number not a string in the specific time units let's add a new field called waiting underscore days with an int 16 data type then add the date time diff function we have three parameters to set here dt1 for the first state field dt2 for the second date and u for the unit of time that we will compare dt1 and dt2 with same as earlier this can be in years months days hours minutes or even seconds and should also be enclosed in quotes for gt1 we will use payment date underscore 1. for dt2 we will replace it with transaction date underscore 1 and then we will set days as the unit this will calculate the number of days in between the transaction date and date of payment after running the results we now have the number of days in between the two date fields we can also insert the date or date time of the day that workflow was run using the date time now and date time today function the difference between the two is that date time today only returns the date while the function date time now returns both the date and the time of the workflow run let's add the date time now function to a field called work flow underscore date time then type in the date time now function and set the data type as date time since this function does not need any parameters we can now run the workflow the exact date and time value the workflow was run is now appended to our data set if we run this workflow again tomorrow the date will change and be set as october 23rd 2020. we can also put the first state of the month or the last date of the month by using the date time first of month open parentheses close parentheses and date time last a month open parentheses close parentheses function if you wanted to separate the year and month from your date value so you can separate or group it later on you can utilize the date time month and date time year function let's create two new columns called transaction year and transaction month and set both as a vw string with 25 as the size on the transaction year's expression editor add a date time year function by navigating to the functions icon select date time category and click on date time year we only need to specify the dt which will be the date time field we will extract the year from type in transaction date underscore 1 in the parentheses then let's do the same for the transaction month column add in the datetime month function then specify transaction date underscore 1 as the field once done run the workflow now we have extracted the year and month of the transaction date column you can also do the same extract to the date by using a different unit of time by using the functions date time day open parentheses close parentheses date time hour open parenthesis close parenthesis date time minutes open parenthesis close parentheses and date time seconds open parentheses close parentheses when checking your data format you might encounter instances where the columns are concatenated to one column this can be a common site since not all data available to us is in a clean format and is ready for analysis in this class we are going to discuss how to use the text to column tool which offers an efficient way for us to separate concatenated values the tool splits the text from one field into either separate rows or columns wherein each value's separation is indicated by a delimiter let's look at how we can configure the text to columns by delimiting this data set that contains plant id and plant information concatenated in one column from the pars tool set select text to columns and add it to the canvas after the input tool the first configuration lets you choose which column to split from the drop down select the plant field then we have the delimiter text box here you will need to indicate the character or symbol that is used to distinguish individual data values and where we can split the data here in our example the pipe symbol is used as the delimiter let's type in the pipe symbol into the text box you can also use other characters such as commas periods and dashes white space characters such as tab space new line and space slash tab can be used using the following white space character keywords as shown in this image now we have two radio buttons for the split method you can either split it to the columns or split two rows split to columns will split a single column of data at each instance of the specified delimiter into multiple columns while split two rows split a single column of data at each instance of the specified delimiter into multiple rows this method will require a key or id column so you don't lose track of which value came from which row let's try out the split to column first in this method we need to configure the number of columns that we will split the text into in this example we wanted to generate three fields from the plant column since it has the plant name family and the scientific name indicate the number of columns as three then we have a drop down where we have to choose the action that alteryx will take if it finds extra characters left over from the three columns that we indicated previously the options are to leave extra in last column where data that extends past the split is appended to the value in the column drop extra with warning this drops the extra date and a warning is generated indicating that there was excess information drop extra without warning where data that extends past the split is dropped and no warning is generated and finally error where data that extends past the split causes an error and the workflow stops processing let's select leave extra in last column so we do not lose any data then we have the output root name text box where we need to enter the name for the new columns to be generated the columns will have this root name plus an integer in the field name in our example since we have the root name plant our generated fields will have the names plant 1 plant 2 and plant 3. there are also advanced options we can set on the last part of our configuration window the ignore delimiter in quotes single quotes parentheses and brackets will ignore the delimiter symbol if it has been enclosed in any of the mentioned symbols meaning it will be retained as a value instead of marking it as a separator the last option is to skip empty columns this will disregard the lines that has an empty value and we'll treat the next string as the correct value for that placement let's run the workflow to see if we need to add any advanced options to our data set our first and fourth record has null values on the plant 2 field and the data was pushed into plant 3. referencing from the original concatenated plant volume we can see that there are extra pipe delimiters in between value 1 and value 2. so let's enable the skip empty columns and then rerun the workflow now that we have it enabled it has ignored those extra delimiters in between and they have aligned the second value properly now let's try the split two rows method instead of creating new columns this will create new rows for each value add a new text to column tool below the first one and connect it to our input data set the same column plant to be split and pipe as the delimiter then click on the split to rows radio button to set the method and enable the skip empty columns once done run the workflow from the results we can see that each value has been separated using the delimiter but it has been added as a new column this is why the record id is essential so as to reference which line belongs to which record we can also use this method of splitting to rows first if we are not sure how many columns or values our data has we only need to add a summarize tool after it on the new summarize tool select the record id in this case the plant id then set action to group by after that select the field we split in this example our plant field and set its action to count then run the workflow now we have the number of data values for each record id we can now set the split two columns into three since they all have the same amount of value now that we have separated the data we can rename the generated fields in a select tool but another nifty way of applying a rename is via the dynamic rename tool of the developer's tool set the dynamic rename tool renames columns within an input data stream you can use this tool to parse text files to pull column names from the first row of data or a description file or to rename a pattern in the column names such as removing a prefix or suffix or replacing underscores with spaces this time we are going to use this tool to take the headers from this text input and make this the new field name of our plant data set make sure that the fields align first on a select tool deselect the plant field that contains the concatenated text what's left is the four field names from the left and that is the same for the right text input which has the correct four field names now let's reconfigure the dynamic rename tool on the rename method select take field names from write input metadata in this method selected columns are renamed using the field names from a second input connected to the r anchor only the field names from the second input are brought in the data does not come through you can also use this tool to get field names from formula to add prefix suffix on the field name and to take field descriptions once we've set the rename method we can now adjust the report behavior if the field names from the left did not match the right input you can either set it to warn error or ignore if you wanted to follow the data type and size from the right input you can enable the last configuration change field type and size once done run the workflow now we have applied the correct field names from our delimited data set most analysts and data enthusiasts who are regularly using excel files are familiar with the vlookup function of excel vlookup or vertical lookup is a function that lets you search for data in a table that is organized vertically this is usually used to append a value from the source into a currently used table altrix has a similar tool to the vlookup function this is the find replace tool from the join tool set the find replace tool searches for the data in one field from one data stream which can either replace or append a specified field from a separate data stream drag a find and replace tool from the join tool set and put it in the canvas the tool has two input anchors f anchor stands for find this will be the table or data that you want to update with the results let's connect our company name and addresses input into the f anchor our anchor which stands for replace this will be the input or the source table that contains the data that you wanted to replace or append to the original input we will connect our abbreviations input to the r anchor what we wanted to do is change the common address words from the street address into an acronym provided by this second sheet the second input data is from the second sheet of our excel file it contains the common address names and abbreviations or acronyms in this tutorial we are going to replace the long common street words into their abbreviations for example our first line has the address 71774 hopker parkway we wanted to change the word parkway into pkwy as indicated by our second sheet so every instance of the word park that's in the address column would be abbreviated to pkwy let's go over the first part of the configuration the upper left side is composed of three radio buttons which let you choose the part of the field that contains the value we need to find we can either look for the word at the beginning of the field any part of the field or the entire field set it as beginning of the field if the field is from the start of the string for example if you are looking for street numbers you can find it at the beginning of this street address any part of the field will look for the word slash character anywhere it can be from the start the middle or the end part of the string this will be the method we will use for our example since we wanted to look for each instance of street common names in any part of the street address string lastly is the option for entire field this will make the find and replace method similar to that of the join tool it will look for the exact word value in its entirety of the column meaning those that contain the words but is not an exact copy of it will not be matched to the data value let's set the radio button to any part of field now we have to set the find within field this should be the field where we wanted to match or find the data values in in this case we're going to select street address then we have to set the find value field this should be the field from the reference table or the data source we have connected to the r anchor in this example we are setting it to common name this means that we wanted to find the common name anywhere in the street address next we have two optional parameters enable case insensitive find if you wanted to match for words regardless if they have different letter case and enable the match whole word only if you wanted to search for the whole words meaning it has white space before and after the value for this example we will enable both case insensitive find and match whole word only since we are using common names the last part of the configuration is to choose how the data values that have matches found will be put into the current data stream here we have two options either replace found text within the value or append the fields to record choose replace found text within the value to update the original column with the new data from the specified field this will replace the common names found within street address by the address acronym optionally we can also enable replace multiple found items if we wanted to replace multiple strings in each cell but this function should only be enabled if we have used the any part of the field method choose append field to record if you want to attach any column from the reference table into the current data stream you can select which fields to append by selecting and deselecting them on the list let's try the replace function first set the as field address acronyms then run the workflow the parkway has been shortened to pkwy same as the lane which has become ln and so on those that did not find a match such as the word park will retain their original value let's configure the tool to append and see how it differs in output in the configuration window select append field to record then choose the common name to append it to once done run the workflow in the output window at the far right of our data set the two columns that we specified have been appended the lines with null value mean that those records do not have any match upon looking at the value slash words on its street address thus it did not have any matching acronym nor common name the find and replace tool can also work like a join tool in terms of exactly matching the values let's test it out on the same data set by using a different reference table this has a list of states abbreviation and their name the goal of this example is to append the long state name to the current company data stream using the match values in state code or abbreviation on a new find and replace tool set it to find entire field then set find within field to state with find value as state abbreviation for the replace part we will set it to append field to record then append the state name then run the workflow now we have appended the long state name to the data stream based on the exact value match of their state code you can always add a filter after the replace tool to check if it has null values or unmatched records the difference of the find and replace tool to that of the join tool is their flexibility of function find and replace has the functionality to search on any parts of the string while the join can only match by exact character to character values next is their run time a find and replace tool might take a lot longer since you are looking at a match for each word or characters of every record while the join tool is a straightforward process of matching value by value third is their output anchor the find and replace tool has only one output anchor so you have to filter out if there are some nulls if you're going to append a function while the join tool has an output for the left slash write data that do not join and the join matches fourth is their reference match if there is a duplicate value from the reference table the find and replace tool will use the first unique value and disregard the duplicate while the join tool will output all matches including the duplicate if you only want to append the data and not match it by value nor by row you can utilize the appends fields tool this tool performs like a cartesian join wherein every record from the source is joined to every record in the target data stream meaning if we have five records on the source and a thousand records on the target data stream this results in five thousand records let's see this tool in action on the same data set we used earlier this time we will append two records containing the year 2019 and 2020 to each row of our company data set click on the tool join set and drag an append field into the canvas this tool has two input anchors like the find and replace tool t anchor for the target data stream and the s anchor for the source or the smaller data stream which will have the records to be appended connect the company dataset to the t anchor and the text input into the s anchor the configuration window of the append looks the same as that of the select tool but this also has an input indication as to where the listed field came from same with the select tool you can rename deselect and change the data type of the fields aside from the window we can also set the message to show if it's generating too many records you can set it to either allow all appends all records will be appended to all records with no error or warning warn on appends or more than 16 records if more than 16 records are in the source file s input a warning is reported and the workflow continues processing error on appends of more than 16 records if more than 16 records are in the source file as input an error is reported and the workflow stops processing we will set it as allow all appends since we wanted the year 2019 and 2020 to be attached to each record and we will not make any further changes in its embedded select window upon running the workflow we can observe from the connection progress that two records were taken from the source 100 records from the target field and it has an output of 200 records in the output window we now have a 2019 and a 2020 value appended for each record of the target company data set in an earlier topic we discussed the use of the formula tool which helped us efficiently create multiple functions to several fields in just one tool but what if we just wanted to apply the same function to several columns or fields do we have to create multiple copy paste functions for the rest of our configuration there is no need for that as we have the multi-field formula tool this tool is used to create or update multiple fields using a single expression the downside to it is that you can only use one function or expression per tool if you wanted to apply two or more expressions using the multiple field tool you need to add a new tool for each let's try this tool out on a workflow where we are preparing the categorized supermarket sales that will be used in a clustering model here we want to replace all the null values with a zero so that our later analysis won't encounter error messages when it's running drag a multi-field formula tool from the preparation tool set into the canvas and connect it to the input data tool the first part of the configuration window has the list of fields available from the input data that we connected it to you can minimize this list by selecting the specific data type that you want it to work with in this example all of our field is in the double data type which is a numeric in the list select fresh milk grocery frozen detergents paper and delicatessen as the fields that we are going to work with you can also choose to include the dynamic or unknown fields to take in new fields that will be added in the future we also have two optional settings on how we want to show our output enable the copy output fields and add if you wanted to put the output of your expression into new columns this also has the option to add a prefix or a suffix in your data let's disable or uncheck this one because we only wanted to update our current fields you can also change the data type and size of the output columns by enabling the change output type 2. let's keep this unchecked and proceed with our expression here we have three tabs and one expression box the first tab contains the list of variables this is similar to the variables and constants icon on the formula tool it contains the list of current field original fields and constants click on the plus icon of the current field this will be the variables we will be working with when we are applying an expression to multiple fields current field is for using the selected fields for the function current field name is for using the file name in the function current field type is for using the data type of the selected fields in the function the functions tab same with the formula tab has the list of functions grouped by category and the final tab contains the recent or saved expression that we have in designer you can also save and load your current expression on this tab since we wanted to replace nulls only we will be using a conditional if then else function on the functions tab select the if then else function our c or condition will be to check if the current field is null so we will use the is null function from the test functions click on the is null function to add it to the expressions editor then replace v with underscore current field underscore from the variables tab replace the t with 0 and replace the f with underscore current field underscore this means that every time we find a null value on any of the fields or values we will replace the value with 0 but if it has a value the original value will be retained after running the workflow we now have replaced the nulls into zeros throughout the six fields another tool that has a similar capability is the multirow formula tool this tool can help us reference and compare rows using the different formula functions it is different from the regular formula tool and the multi-field formula is on which record it works on rather than comparing and processing columns by columns this tool uses row by row to further explain its usage let's use the tool on this data set that has a list of players login dates and their active time while being online what we wanted to see in our output is the number of days between each of the players login so for this process we need to compare the dates between each line and get their difference drag a multi-row formula from the preparation tool set into the canvas and connect it to the select tool the first configuration of the tool lets us choose whether we wanted to update our existing field or create a new one let's create a new field and name it as days between login then set the type to an int 32 the num rows configuration lets us set the number of row variables that we will use for our expression set it to a 1 if you are going to compare row 1 to one row below or above it or set it to a 2 if you wanted to compare every other row and so on we are going to set it to a 1 since we are going to compare each date per line next we have an optional configuration of setting a group by a specific field our data set has the player information and we need each date computation separated by player so we are going to group this by player id for those instances wherein the date for each player ends and can't compare it to the next one since it's either the end of the data for that player or it did not precede any rows above it we can set those values by the values for row that don't exist drop down menu we're going to set the value to null next we have the same three tabs that were also available in the multi-field formula tool earlier the only thing that differs is variables which now have the row plus 1 and -1 for row comparison for our output we wanted to get the days between each row so we are going to use the data time diff function add the data time diff function by selecting the date time category of the functions tab and selecting date time diff before we set the parameters of this expression it is important to ensure that our data is sorted properly add a select tool then configure it to player id in order of ascending and the login date in ascending order as the second level this is done to ensure that the rows with the same player id are grouped together and we have the dates of login from the last dates to the earliest date so as to compute the date difference properly back to the multi-row configuration set the dt1 into row plus 1 colon login date and dt2 into row plus 0 or active rows login date then set u or the units of difference into days this will compute the days of difference between the date of line 2 minus the date of line 1 and put the value on the days between login column run the workflow to check the output looking at the output we now have the difference between each date by player id the difference between may 1st 2020 and may 6 2020 is obviously 5 days which has been saved into row 1's new column same with the difference between may 1st and june 3rd which is saved in row 2 and so on and so on we can also see the null values that we indicated in values for row that don't exist earlier the data stopped because there is a different player id on the next line we can also use this tool to populate missing values in our data set in this text input we have the year quarter and sales for these three titles separated by quarter the problem is that the year is not populated throughout and was only indicated on the first line every quarter 1. let's use the multi-row formula to populate the null values by filling down the year on a new multirow formula tool set it to update existing field make sure that the field year is also selected on the drop down since this is the field we need to fill set the num rows to 1 since we are still comparing each line and the values for rows that don't exist to zero or empty we will not use any groupings this time on the expressions editor add the if then else function replace c with is null open parenthesis year close parentheses then replace t with row dash one colon year and finally replace f with active row year this will test if the current row you're looking at is null or not if it's null it will populate it by using the earlier row if it has a value it will retain that value this means that every time the tool processes a line it checks if it has a null value if it has a value it will leave it as it is but if it is a null it will take the value of the earlier row and replace that null value this will be done for each cell of the specified column until it reaches the last record of the data set run the workflow to check if we have correct output as expected we have populated the rest of the null values with the correct year values let's look at an example of how we can integrate the past tools we have discussed into one workflow the data set we are going to use is a list of movies with the rating director and actor listed in a separate file what we wanted for our output is to have the data from all of the data sets together in one data stream and clean them as well as the first step let's check on our four data sets first according to the metadata on the results window our fields are all in v string in the same size going back to our data tab we can see a small yellow line on the data quality bar of all the fields which indicates that it has null rows we can also see a red bar on the movie title that indicates that some of its lines have white spaces we also have three id fields in the file movie id director id and actor id we will use these ids later to join our data sets together let's combine the movie title fields and the ratings first add a join tool to the canvas from the join toolset connect the movie's title into the left input actor while the ratings will be on the right configure the join tool to join by specific field since we have ids on both data sets then set the left field to movie id and the right join field to title id on the embedded select window let's select the title id field then run the workflow the connection progress shows us that we have joined 46 records on the j output anchor but there were also 10 records on the left output anchor that were not matched by movie id this means that the ratings for these movies are not available on the ratings file look at the l output anchor we have six rows that are null and four that have data we wanted to put those four rows back into our data stream and disregard the nulls what we need to do is add a filter tool connect it to the left anchor then configure it as a basic filter wherein movie id is not null run the workflow to check our progress we have now successfully removed the null rows so let's mark this data since its ratings are not available this will make a future qa easier if we are looking at missing outputs on a new formula tool add a column called rating then set it as a data type vw string that's 255 in size for its value set it as the text not available enclosed in quotes now that we have it marked let's add a union tool to put this data back to our current data stream connect the output anchor of the formula tool into the input of the union tool do the same for the j anchor you can check if the fields are aligned by looking at the manually configure fields output columns list let's run the workflow again to see if we have done it correctly in the output window looking at the far right of the data set we now have the rating where the lines that did not join earlier have been included with the mark not available now let's connect the director names to the data set there is only one director id per line so we can use the join tool to match it by director id on a new join tool connect the output anchor of the union tool into the left input anchor of the join tool then connect the director input data to the right input anchor once done configure it to join by specific fields using the director id column as we can see on the embedded select window there is a duplicate field on the output we can remove it by clicking on options then click on deselect duplicate fields then push the field director name to the bottom of the director id on the list by selecting the field then clicking the arrow up four times once that is done run the workflow again we now have three of our data sets available let's add the final one the actor id of the data set has been concatenated per movie so this time we can't rely on the join tool but instead we're using the find and replace tool drag a find and replace tool into the canvas the left input anchor will be from the j output anchor of the earlier join tool while the right input will be from the actors data set we will now configure the tool to look for any instance of the actor's id and replace them with the specific actor name to do so set the find method to any part of field set the find within field to actor's id then set the find value to actor id of the right input then for the replace choose replace found text with value then set this to actor name enable to replace multiple found items run the workflow if you are done looking at the actors id field we now have replaced the id with the value of each actor's name since we are using the full name for the directors and actors let's ensure that these fields are in the correct title case add a multi-field formula into the canvas then connect the output of the find replace tool to the input of the multi-field in configuring the tool select the field director name and actor's id on the list since we are just updating the field you do not need to enable the copy output fields and change output type then on the expression editor let's add the function from the functions tab under the strings category select title case in this function indicate the string to be the variable underscore current field underscore this will process both our director name and actor's id into the title case format then let's convert our date published field to the proper date type and format by using the date time tool add the tool to the canvas and put it after the multi-field formula in the configuration set the conversion format to string to date slash time format then set the string field date published as the field to be converted its new column name will be screening date once we have that selected specify english as the language our date published string date has the format days then months and year all separated by a period we will use a custom format since it is not available in our list select custom on the format list then on the text box type percent symbol followed by lowercase d then put the separator period then percent symbol again followed by smaller case m another separator period then percent uppercase y this specifier format means that the days are in two digits same with the month and the year in full four digits the separators are indicated in between you can check if you have created the correct format by looking at the example below the example should match the format that you have indicated while the output will be in the designer iso date format of year dash month dash day we still have an unformatted field in our data set which is the movie title we will need to trim some of its white space as indicated by the red triangle above the cells that has white space to do so add a formula tool to the canvas set the column as movie title then on the expression editor add the trim function specify the string to be cleansed as movie title then run the workflow to check our progress now we have no white space left as proven by our data quality indicator bar which is now all in green looking at the output data anchor we still have a concatenated field that is left in our data set specifically the actor's id since we are not sure of how many actors there are for our data set let's count the values first add a text to columns tool to the canvas and connect it to the formula tool set the column to split into actor's id with the symbol comma as the delimiter then set it as split two rows and enable skip empty columns this will split actors id into rows once done let's add the summarize tool so we can count how many rows it is split into for each movie put the summarize tool after the text to columns used earlier then select movie id and set its action to group by after that select actor's id and set its action to count distinct non-null this will give us an output of movie id and the count of actor id that it has but we still need to get the maximum count from this list to do so add a new summarize tool then using the count distinct underscore actors id field set the action to max and run the workflow we now have the maximum number of actors available in our data set let's use this information to split our original data stream to four actor columns in a new text to columns tool set the column actor ids and the delimiter comma symbol now let's select split to columns with 4 as the number of columns let's leave extra characters in the last column and use the word actors as the output root name once done run the workflow since we have completed our cleansing and blending process we can now remove the extra columns that we don't need anymore in a select tool deselect the director's id actor's id and date published then move the actors 1 to actors 4 below the director name once done we can now save this output to our desired path and format via the output data tool gaining more knowledge about the tools in designer can help you effectively automate most of your manual processes in doing so your workflow expands and becomes more sophisticated as the process gets more complicated thus making it more difficult for you to backtrack your progress or even check errors in the farthest end of the line in this session we will talk about some of the tips you can use to make your workflow comprehensible and easier on the eyes tip number one prep your canvas the canvas is the main thing we are looking at when working on designer we can customize this orientation and settings by navigating to options user settings edit user settings canvas the orientation of the workflow is on the layout default where we can set it to either horizontal or a vertical layout horizontal lets you work from left to right while the vertical one lets you work from top to bottom we recommend using verticals if you are used to using the scroll buttons to navigate you can also change the color of your canvas or add a grid to make it easier for you to align your tools it has a snap to grid option to make drag and dropping tools much easier the canvas settings also let you choose the type and color of your connection lines under connection settings you can choose from auto root bezier which is the kirby connection line and automatically curves to the point and swerves away from tools auto root perpendicular which is a pointer version of the bezier and straight that is true to its name simply a straight line unlike the autoroute lines earlier it does not swerve around lines and tools lastly you can show or hide the progress or the small annotations that have the number of records percentage and record memory size this can be useful if you are checking some duplicates or if you are blending data from various data streams tip number two keep your favorites close in our tool set you can see that the first tab is called favorites this tab is always positioned on the first tab and contains the tools that we frequently use for our workflows and processes to add a tool to your favorites hover over the tool and you will see a star icon on its upper right click on it and it will be added to the favorites toolbar aside from the favorites toolbar you can also pin a specific tool set to the front of the tool palette right click on the tab of the tool set you want it to pin then select the pin plus tool set name this tool set will then be moved beside the favorites toolbar tip number three altric's sticky notes having the proper documentation is extremely important capturing your logic and taking notes of important steps in the altrix designer is quick and easy with the use of the comment tool the comment tool is under the documentation tool set this lets you add notes and images to your workflow to make it easier for you to explain your processes in the documentation tool set select the comment tool and add it to your workflow you can double click on the comment box and type your notes directly or you can also type it in the configuration windows text box you can also personalize the shape color and font that you want for your comments if an image is needed you can load your own image by clicking on the folder icon and selecting the image file on your pc this image will then resize according to the size of your comment box drag on its edges to adjust the size of the comment and you can also hold and drag it to other parts of your workflow but if you're adding comments for each tool it is better to use the annotations annotations are the small rectangles seen below each tool if you have set the canvas options annotation to show they also move together with the tool unlike the comments to add one to your tool click on the tool and select the annotations icon type the text you want to add then choose a setting on the drop down on how you wanted this annotation to show up using canvas settings will depend on the canvas options you have set always will show the annotation regardless of canvas setting and never will not show the annotation on the canvas you also have the option to show it above the tool instead of on the bottom of the tool you can also add comment strings to your expression editors by adding a double forward slash before your single line of comment comment strings are useful for marking and taking note of the logic used for that expression if you wanted to add a block of comment string and close it in a forward slash asterisk and an asterisk slash in the end this can be done for any expressions editor like those in the filter tool formula tool multi-field and many others number four organize like maricondo clutter is caused by a failure to return things to where they belong altryx helps us channel our inner marie condo by using the tool container this tool helps us organize or isolate a section of our workflow to use the tool drag a tool container from the documentation tool set into the canvas then we can select or highlight the tools you need to group and drag them inside the rectangular area of the tool container now you have them separated from the rest these tools will still be processed if the container is open you can disable them by checking the disable checkbox of the configuration window or click on the blue button on the upper left part of the tool container itself it's blue if it's open and white if it's disabled having it disabled will hide the tools and stop them from processing the data in that stream you can also change the tool container's caption to make it more informative by replacing this name on the text box or double click on the name you also have the option to change its text color fill color border color transparency and margin tip number five decluttering wires having 100 tools can also mean having a hundred connection wires if a couple of connections are blocking your view you can make them wireless by clicking on the line itself and enable the wireless function the anchors with this symbol mean they have wireless connections what if you have multiple connections connected to a tool and realize that you need something in between you do not have to delete every wire only to reconnect them again to the inserted tool just right click on the tool then select insert after and select the tool to insert the rest of the outgoing connections will be connected to this new tool but what if it is the other way around if we wanted to get rid of an extra tool in between right click on the tool select delete and connect around tip number six align and distribute it can be hard to align your tools at times even when you are using the grid view good thing we have an auto align function in designer just highlight your tools then right click and choose either a line horizontally for horizontal canvas or align vertically for vertical canvas orientation you can do the same by pressing the shortcut keys control and shift and dash symbol to highlight horizontally and control and shift and plus symbol to align vertically tip number seven shortcuts makes your work fast alteryx has several hotkeys that can help you design your workflows efficiently here are some of our favorites if you have several tabs of a workflow opened you can move between each tab by pressing ctrl and tab this will show a small window that has the workflow name and a small canvas preview if you wanted to save all of those open workflows press ctrl and shift and s the usual shortcuts for undo redo copy cut and paste also works for the tools in our workflow press control and f to show the find menu this can be helpful if you are looking for a specific tool or function in your canvas to pan your canvas without using the scroll buttons hold your mouse center slash scroll button then the pointer will change to this symbol and you can now pan the canvas as long as you hold the center button you can also zoom in on one part or another by pressing ctrl and scrolling up and down if you wanted to refresh your workflow when the incoming data file has been changed externally press f5 if you're not a subscriber click down below to subscribe so you get notified about similar videos we upload to get the files the instructor used in this tutorial and follow along click over there and click over there to watch more videos on youtube from simon says it

Info

Channel: Simon Sez IT

Views: 388,114

Rating: undefined out of 5

Keywords: Alteryx, Introduction to Alteryx, Alteryx training, Alteryx tutorial, Alteryx for beginners, Introduction to alteryx training, alteryx tutorial, alteryx training, alteryx demo, advanced analytics, what is alteryx, customer analytics, big data analysis, data preparation, how to use alteryx, alteryx tutorial for beginners, alteryx designer, alteryx certification, business analytics, alteryx for excel, alteryx tutorial point, alteryx beginners, alteryx for excel users

Id: YISWJ-G7Zt4

Channel Id: undefined

Length: 171min 2sec (10262 seconds)

Published: Mon Jan 18 2021