Text Data Types

Introduction to PowerQuery Data Transformations
9 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€64.90
List Price:  €92.72
You save:  €27.82
£55.48
List Price:  £79.26
You save:  £23.78
CA$94.84
List Price:  CA$135.49
You save:  CA$40.65
A$107.44
List Price:  A$153.49
You save:  A$46.05
S$94.44
List Price:  S$134.92
You save:  S$40.48
HK$547.73
List Price:  HK$782.50
You save:  HK$234.77
CHF 63.11
List Price:  CHF 90.16
You save:  CHF 27.05
NOK kr759.51
List Price:  NOK kr1,085.06
You save:  NOK kr325.55
DKK kr484.31
List Price:  DKK kr691.90
You save:  DKK kr207.59
NZ$117.18
List Price:  NZ$167.41
You save:  NZ$50.23
د.إ257.02
List Price:  د.إ367.19
You save:  د.إ110.16
৳7,686.18
List Price:  ৳10,980.73
You save:  ৳3,294.54
₹5,834.28
List Price:  ₹8,335.04
You save:  ₹2,500.76
RM330.66
List Price:  RM472.40
You save:  RM141.73
₦97,668.24
List Price:  ₦139,532.04
You save:  ₦41,863.80
₨19,468.63
List Price:  ₨27,813.53
You save:  ₨8,344.89
฿2,546.70
List Price:  ฿3,638.31
You save:  ฿1,091.60
₺2,265.91
List Price:  ₺3,237.16
You save:  ₺971.24
B$351.06
List Price:  B$501.53
You save:  B$150.47
R1,322.01
List Price:  R1,888.66
You save:  R566.65
Лв126.52
List Price:  Лв180.76
You save:  Лв54.23
₩94,315.14
List Price:  ₩134,741.70
You save:  ₩40,426.55
₪257.38
List Price:  ₪367.71
You save:  ₪110.32
₱3,931.96
List Price:  ₱5,617.33
You save:  ₱1,685.37
¥10,592.26
List Price:  ¥15,132.45
You save:  ¥4,540.19
MX$1,160.67
List Price:  MX$1,658.17
You save:  MX$497.50
QR255.40
List Price:  QR364.88
You save:  QR109.47
P964.60
List Price:  P1,378.06
You save:  P413.46
KSh9,279.27
List Price:  KSh13,256.67
You save:  KSh3,977.40
E£3,320.75
List Price:  E£4,744.14
You save:  E£1,423.38
ብር3,980.93
List Price:  ብር5,687.29
You save:  ብር1,706.36
Kz58,401.19
List Price:  Kz83,433.85
You save:  Kz25,032.66
CLP$68,800.17
List Price:  CLP$98,290.17
You save:  CLP$29,490
CN¥505.43
List Price:  CN¥722.08
You save:  CN¥216.64
RD$4,147
List Price:  RD$5,924.54
You save:  RD$1,777.54
DA9,435.57
List Price:  DA13,479.97
You save:  DA4,044.39
FJ$157.96
List Price:  FJ$225.66
You save:  FJ$67.70
Q546.07
List Price:  Q780.13
You save:  Q234.06
GY$14,663.88
List Price:  GY$20,949.30
You save:  GY$6,285.42
ISK kr9,758.70
List Price:  ISK kr13,941.60
You save:  ISK kr4,182.90
DH709.58
List Price:  DH1,013.74
You save:  DH304.15
L1,235.01
List Price:  L1,764.37
You save:  L529.36
ден3,991.30
List Price:  ден5,702.11
You save:  ден1,710.80
MOP$564.38
List Price:  MOP$806.29
You save:  MOP$241.91
N$1,336.41
List Price:  N$1,909.24
You save:  N$572.83
C$2,577.43
List Price:  C$3,682.20
You save:  C$1,104.77
रु9,342.84
List Price:  रु13,347.49
You save:  रु4,004.64
S/260.45
List Price:  S/372.08
You save:  S/111.63
K268
List Price:  K382.87
You save:  K114.87
SAR262.59
List Price:  SAR375.15
You save:  SAR112.55
ZK1,736.79
List Price:  ZK2,481.24
You save:  ZK744.44
L322.64
List Price:  L460.94
You save:  L138.29
Kč1,642.49
List Price:  Kč2,346.52
You save:  Kč704.02
Ft25,571.54
List Price:  Ft36,532.34
You save:  Ft10,960.80
SEK kr748.30
List Price:  SEK kr1,069.05
You save:  SEK kr320.74
ARS$60,059.22
List Price:  ARS$85,802.56
You save:  ARS$25,743.34
Bs483.92
List Price:  Bs691.35
You save:  Bs207.42
COP$270,819.36
List Price:  COP$386,901.39
You save:  COP$116,082.02
₡35,222.95
List Price:  ₡50,320.66
You save:  ₡15,097.70
L1,728.74
List Price:  L2,469.73
You save:  L740.99
₲517,758.37
List Price:  ₲739,686.52
You save:  ₲221,928.15
$U2,629.19
List Price:  $U3,756.16
You save:  $U1,126.96
zł279.29
List Price:  zł399
You save:  zł119.71
Already have an account? Log In

Transcript

The first data type we're going to cover is the text data type text consists of letters, numbers, or symbols placed together in a specific order as demonstrated text is treated quite differently from numbers. This whole chapter will focus on the transform tab, and then this lesson, in particular, the text columns subsection. Let's start with some common text data problems. First, let's address capitalization in text formatting. Unlike numbers, texts can be formatted quite differently with uppercase and lowercase letters. Look at my names information.

Someone was typing the names and clearly Miss capitalized them Excel has the functions proper, upper and lower to manage text formats, and the query editor has very similar function. So let's use this query and edit our names to be more appropriately capitalized on the transform tab under the text column. There's a format drop down with quite a few options. The first three options cover lowercase, uppercase, and proper capitalization. Each one applies the desired formatting to all selected data fields that are pretty straightforward. After these three options, there's the clean trim, Prefix and Suffix.

Another common problem occurs with leading and trailing whitespace. So let's close off of the query and look at our data a bit closer. Have you ever tried to do a lookup or some ifs using a pair of seemingly identical data sets get your formula doesn't work. Let's do a quick calculation in Excel. Here. I want to calculate the average score issued by Tracy Jones.

Manually calculating it's pretty easy since there are only four entries, so my average is five. But using the average ifs function, I get 4.331 of the Tracy Jones values in my table has trailing whitespace at the beginning, that makes it incompatible with the lookup value if you've encountered this You've probably used the trim function to correct it. The query editor can do this with the trim command as well. So let's go back into our query and add that adjustment. It works identically to the capitalization functions. It's just slightly less noticeable, since all that happens is that all the whitespace gets removed from the beginning and the end.

Closing loading this query by average ifs can now find all of the right entries, and the average comes out to five. The next command in our drop down is the clean command. The Clean command works very much like the trim command, except it simply removes unreadable characters. unreadable characters are data fragments that can be embedded in information unreadable by Excel or by your text editor. They'll generally show up as a backslash question mark, and simply using the clean command will strip them out and give you the rest of your text as it appears. For most data.

The Clean command will do almost Nothing. For example here apply it to my report does nothing since there are no missing characters so you could feel safe about applying it to data sets where you might think that there could be unreadable characters involved. The final two commands are the Add prefix and add suffix commands. Well, there's no necessity, I can show how they work by adding a prefix to my social security number, the prefix I'll add this I'll just add SSN colon, and press OK. Doing so adds the SSN colon to the beginning of the text. The Add suffix does the exact same thing except it does it to the very end of the data set.

As you can see, the format drop down is true to its name. It offers you different ways to reformat and edit the text and its appearance with the Add suffix and add prefix options be a little different in that they actually let you add information into the text. The next text specific functions by the split column commands In the prior chapters and the append queries, we use the merge columns command to combine data. Well, the split column is basically the opposite of that. Since we've added SSN colon to our social security number field to the beginning of each row. In fact, I now want to split the number off of that set, I have two different options.

To do this, I can split based upon a specific delimiter or I can split based upon the number of characters. Let's start with the by delimiter approach. The by delimiter approach offers a number of different options. You can use a predefined delimiter, such as colons, commas, equal signs, semi colons, and so forth. Or you can create a custom delimiter. Each of these is only a single entry except for the custom the custom can be a longer text string.

So for this, since I use colon, I can use my colon here and then I have some additional options. I can split it once at the leftmost delimiter Once the rightmost delimiter, or each time that delimiter is met, for this data set, it won't make a difference. I'll choose the left. And I can see my data set gets split. But let's undo that. And let's look at another way of doing it, I can use a custom delimiter.

And now I can actually enter text. So if I enter SSN colon, the exact thing I'm looking for, I can either against split at the first time it intersects it, the last time it intersects it, or at every occurrence. Again, since SSN only exists once here, this doesn't matter. If I use a hyphen ended each occurrence, I would get the number split into three components. But using this approach, the SSN actually gets split out and I get my number and then a blank because there was no information before SSN. So that leaves us with the number of characters approach.

The numbers of characters approach is very similar except you to find the number of characters that you want to go so if I enter four characters. Now I can split it by four characters in three different ways. The first way is to split once. So that's to take the first four characters on the left, and then all of the characters to the right and group it into two columns, once as far right as possible, would be like the right function in Excel, you take the four characters on the far right of the text, and then you take all the characters left at that point, repeatedly, we'll cut that data set up every four characters. So if you had 12 characters in your text, you'd get three different entries. The first four characters, the second set of four characters and the third set of four characters, this won't generate the result that we want them.

So let's see how it came out. I get SSN colon, 482 hyphens. The next 454 hyphen, two is the next door and then the final three digits. So my text field got split up four times in sets of four characters each. I can do it correctly by going back splitting the column by the number of characters and entering for and once as far left as possible to get my SSN and my social security number. The last few commands we'll look at are the merge columns and the extract commands, we're not going to go into the parse commands in any detail, because what they do is they allow you to parse the text information for either XML structures or JSON structures.

Both of those are beyond the scope of anything that we'll handle in this course, the merge columns command, which we're already somewhat familiar with, takes two columns. Let's take our two SSN fields and puts them together, you can include a separator if you want, I'm gonna go ahead and add a space. And upon doing so you can take two columns, text or otherwise and combine them into a single column. The extract functions are a little simpler than our split column functions, but work very much the same way. The first one is the length field. Unlike the other three entries, the length field doesn't extract much.

It tells you How long a field is. So our social security number we know is 15. Actually, now it's 16 characters long because we split it into force. So if I take length, I get the number 16 for all my fields. Now the first characters, we can take the first characters, and this is like a really simplified version of our split column by number of characters. If I do four, I'm going to get just the first four, and it's going to drop everything else, the last characters is almost the same, except, again, we're going to go from the right, we're going to take the last four of our social security number and drop everything else, the range functions sort of like middle, we tell it where to start.

So let's start at four. And let's take in this case 12 characters, press OK, that starts at the fourth character and takes the last 12, which gives us our social security number yet again. So these are all simplified versions of the split column that take that single column and then cut part of the text out of it. And so the That covers the text data functions exposed on our transform tab. With text complete. Let's go ahead and move on to the number data type.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.