Python 101 - Working with Strings

You will be using strings very often when you program. A string is a series of letters surrounded by single, double or triple quotes. Python 3 defines string as a "Text Sequence Type". You can cast other types to a string using the built-in str() function.

In this article you will learn how to:

  • Create strings
  • String methods
  • String formatting
  • String concatenation
  • String slicing

Let's get started by learning the different ways to create strings!

Creating Strings

Here are some examples of creating strings:

name = 'Mike'
first_name = 'Mike'
last_name = "Driscoll"
triple = """multi-line
string"""

When you use triple quotes, you may use three double quotes at the beginning and end of the string or three single quotes. Also, note that using triple quotes allows you to create multi-line strings. Any whitespace within the string will also be included.

Here is an example of converting an integer to a string:

>>> number = 5
>>> str(number)
'5'

In Python, backslashes can be used to create escape sequences. Here are a couple of examples:

  • \b - backspace
  • \n - line feed
  • \r - ASCII carriage return
  • \t - tab

There are several others that you can learn about if you read Python's documentation.

You can also use backslashes to escape quotes:

>>> 'This string has a single quote, \', in the middle'
"This string has a single quote, ', in the middle"

If you did not have the backslash in the code above, you would receive a SyntaxError:

>>> 'This string has a single quote, ', in the middle'
Traceback (most recent call last):
  Python Shell, prompt 59, line 1
invalid syntax: <string>, line 1, pos 38

This occurs because the string ends at that second single quote. It is usually better to mix double and single quotes to get around this issue:

>>> "This string has a single quote, ', in the middle"
"This string has a single quote, ', in the middle"

In this case, you create the string using double quotes and put a single quote inside of it. This is especially helpful when working with contractions, such as "don't", "can't", etc.

Now let's move along and see what methods you can use with strings!

String Methods

In Python, everything is an object. You will learn how useful this can be in chapter 18 when you learn about introspection. For now, just know that strings have methods (or functions) that you can call on them.

Here are three examples:

>>> name = 'mike'
>>> name.capitalize()
'Mike'
>>> name.upper()
'MIKE'
>>> 'MIke'.lower()
'mike'

The method names give you a clue as to what they do. For example, .capitalize() will change the first letter in the string to a capital letter.

To get a full listing of the methods and attributes that you can access, you can use Python's built-in dir() function:

>>> dir(name)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize',
'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index',
'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace',
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']

The first third of the listing are special methods that are sometimes called "dunder methods" (AKA double-underscore methods) or "magic methods". You can ignore these for now as they are used more for intermediate and advanced use-cases. The items in the list above that don't have double-underscores at the beginning are the ones that you will probably use the most.

You will find that the .strip() and .split() methods are especially useful when parsing or manipulating text.

You can use .strip() and its variants, .rstrip() and .lstrip() to strip off white space from the string, including tab and new line characters. This is especially useful when you are reading in a text file that you need to parse.

In fact, you will often end up stripping end-of-line characters from strings and then using .split() on the result to parse out sub-strings.

Let's do a little exercise where you will learn how to parse out the 2nd word in a string.

To start, here's a string:

>>> my_string = 'This is a string of words'
'This is a string of words'

Now to get the parts of a string, you can call .split(), like this:

>>> my_string.split()
['This', 'is', 'a', 'string', 'of', 'words']

The result is a list of strings. Now normally you would assign this result to a variable, but for demonstration purposes, you can skip that part.

Instead, since you now know that the result is a string, you can use list slicing to get the second element:

>>> 'This is a string of words'.split()[1]
'is'

Remember, in Python, lists elements start at 0 (zero), so when you tell it you want element 1 (one), that is the second element in the list.

When doing string parsing for work, I personally have found that you can use the .strip() and .split() methods pretty effectively to get almost any data that you need. Occasionally you will find that you might also need to use Regular Expressions (regex), but most of the time these two methods are enough.

String Formatting

String formatting or string substitution is where you have a string that you would like to insert into another string. This is especially useful when you need to do a template, like a form letter. But you will use string substitution a lot for debugging output, printing to standard out and much more.

Python has three different ways to accomplish string formatting:

  • Using the % Method
  • Using .format()
  • Using formatted string literals (f-strings)

This book will focus on f-strings the most and also use .format() from time-to-time. But it is good to understand how all three work.

Let's take a few moments to learn more about string formatting.

Formatting Strings Using %s (printf-style)

Using the % method is Python's oldest method of string formatting. It is sometimes referred to as "printf-style string formatting". If you have used C or C++ in the past, then you may already be familiar with this type of string substitution. For brevity, you will learn the basics of using % here.

Note: This type of formatting can be quirky to work with and has been known to lead to common errors such as failing to display Python tuples and dictionaries incorrectly. Using either of the other two methods is preferred in that case.

The most common use of using the % sign is when you would use %s, which means convert any Python object to a string using str().

Here is an example:

>>> name = 'Mike'
>>> print('My name is %s' % name)
My name is Mike

In this code, you take the variable name and insert it into another string using the special %s syntax. To make it work, you need to use % outside of the string followed by the string or variable that you want to insert.

Here is a second example that shows that you can pass in an int into a string and have it automatically converted for you:

>>> age = 18
>>> print('You must be at least %s to continue' % age)
You must be at least 18 to continue

This sort of thing is especially useful when you need to convert an object but don't know what type it is.

You can also do string formatting with multiple variables. In fact, there are two ways to do this.

Here's the first one:

>>> name = 'Mike'
>>> age = 18
>>> print('Hello %s. You must be at least %i to continue!' % (name, age))
Hello Mike. You must be at least 18 to continue!

In this example, you create two variables and use %s and %i. The %i indicates that you are going to pass an integer. To pass in multiple items, you use the percent sign followed by a tuple of the items to insert.

You can make this clearer by using names, like this:

>>> print('Hello %(name)s. You must be at least %(age)i to continue!' % {'name': name, 'age': age})
Hello Mike. You must be at least 18 to continue!

When the argument on the right side of the % sign is a dictionary (or another mapping type), then the formats in the string must refer to the parenthesized key in the dictionary. In other words, if you see %(name)s, then the dictionary to the right of the % must have a name key.

If you do not include all the keys that are required, you will receive an error:

>>> print('Hello %(name)s. You must be at least %(age)i to continue!' % {'age': age})
Traceback (most recent call last):
   Python Shell, prompt 23, line 1
KeyError: 'name'

For more information about using the printf-style string formatting, you should see the following link:

https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting

Now let's move on to using the .format() method.

Formatting Strings Using .format()

Python strings have supported the .format() method for a long time. While this book will focus on using f-strings, you will find that .format() is still quite popular.

For full details on how formatting works, see the following:

https://docs.python.org/3/library/string.html#formatstrings

Let's take a look at a few short examples to see how .format() works:

>>> age = 18
>>> name = 'Mike'
>>> print('Hello {}. You must be at least {} to continue!'.format(name, age))
Hello Mike. You must be at least 18 to continue!

This example uses positional arguments. Python looks for two instances of {} and will insert the variables accordingly. If you do not pass in enough arguments, you will receive an error like this:

>>> print('Hello {}. You must be at least {} to continue!'.format(age))
Traceback (most recent call last):
    Python Shell, prompt 33, line 1
IndexError: tuple index out of range

This error indicates that you do not have enough items inside the .format() call.

You can also use named arguments in a similar way to the previous section:

>>> age = 18
>>> name = 'Mike'
>>> print('Hello {name}. You must be at least {age} to continue!'.format(name=name, age=age))
Hello Mike. You must be at least 18 to continue!

Instead of passing a dictionary to .format(), you can pass in the parameters by name. In fact, if you do try to pass in a dictionary, you will receive an error:

>>> print('Hello {name}. You must be at least {age} to continue!'.format({'name': name, 'age': age}))
Traceback (most recent call last):
  Python Shell, prompt 34, line 1
KeyError: 'name'

There is a workaround for this though:

>>> print('Hello {name}. You must be at least {age} to continue!'.format(**{'name': name, 'age': age}))
Hello Mike. You must be at least 18 to continue!

This looks a bit weird, but in Python when you see a double asterisk (**) used like this, it means that you are passing named parameters to the function. So Python is converting the dictionary to name=name, age=age for you.

You can also repeat a variable multiple times in the string using .format():

>>> name = 'Mike'
>>> print('Hello {name}. Why do they call you {name}?'.format(name=name))
Hello Mike. Why do they call you Mike?

Here you refer to {name} twice in the string and you are able to replace both of them using .format().

If you want, you can also interpolate values using numbers:

>>> print('Hello {1}. You must be at least {0} to continue!'.format(name, age))
Hello 18. You must be at least Mike to continue!

Because most things in Python start at 0 (zero), in this example you ended up passing the age to {1} and the name to {0}.

A common coding style when working with .format() is to create a formatted string and save it to a variable to be used later:

>>> age = 18
>>> name = 'Mike'
>>> greetings = 'Hello {name}. You must be at least {age} to continue!'
>>> greetings.format(name=name, age=age)
'Hello Mike. You must be at least 18 to continue!'

This allows you to reuse greetings and pass in updated values for name and age later on in your program.

You can also specify the string width and alignment:

>>> '{:<20}'.format('left aligned')
'left aligned        '
>>> '{:>20}'.format('right aligned')
'       right aligned'
>>> '{:^20}'.format('centered')
'      centered      '

Left aligned is the default. The colon (:) tells Python that you are going to apply some kind of formatting. In the first example, you are specifying that the string be left aligned and 20 characters wide. The second example is also 20 characters wide, but it is right aligned. Finally the ^ tells Python to center the string within the 20 character string.

If you want to pass in a variable like in the previous examples, here is how you would do that:

>>> '{name:^20}'.format(name='centered')
'      centered      '

Note that the name must come before the : inside of the {}.

At this point, you should be pretty familiar with the way .format() works.

Let's go ahead and move along to f-strings!

Formatting Strings with f-strings

Formatted string literals or f-strings are strings that have an "f" at the beginning and curly braces inside of them that contain expressions, much like the ones you saw in the previous section. These expressions tell the f-string about any special processing that needs to be done to the inserted string, such as justification, float precision, etc.

The f-string was added in Python 3.6. You can read more about it and how it works by checking out PEP 498 here:

https://www.python.org/dev/peps/pep-0498/

The expressions that are contained inside of f-strings are evaluated at runtime. This makes it impossible to use an f-string as a docstring to a function, method or class if it contains an expression. The reason being that docstrings are defined at function definition time.

Let's go ahead and look at a simple example:

>>> name = 'Mike'
>>> age = 20
>>> f'Hello {name}. You are {age} years old'
'Hello Mike. You are 20 years old'

Here you create the f-string by putting an "f" right before the single, double or triple quote that begins your string. Then inside of the string, you use the curly braces, {}, to insert variables into your string.

However, your curly braces must enclose something. If you create an f-string with empty braces, you will get an error:

>>> f'Hello {}. You are {} years old'
SyntaxError: f-string: empty expression not allowed

The f-string can do things that neither %s nor .format() can do though. Because of the fact that f-strings are evaluated at runtime, you can put any valid Python expression inside of them.

For example, you could increase the age variable:

>>> age = 20
>>> f'{age+2}'
'22'

Or call a method or function:

>>> name = 'Mike'
>>> f'{name.lower()}'
'mike'

You can also access dictionary values directly inside of an f-string:

>>> sample_dict = {'name': 'Tom', 'age': 40}
>>> f'Hello {sample_dict["name"]}. You are {sample_dict["age"]} years old'
'Hello Tom. You are 40 years old'

However, backslashes are not allowed in f-string expressions:

>>> print(f'My name is {name\n}')
SyntaxError: f-string expression part cannot include a backslash

But you can use backslashes outside of the expression in an f-string:

>>> name = 'Mike'
>>> print(f'My name is {name}\n')
My name is Mike

One other thing that you can't do is add a comment inside of an expression in an f-string:

>>> f'My name is {name # name of person}'
SyntaxError: f-string expression part cannot include '#'

In Python 3.8, f-strings added support for =, which will expand the text of the expression to include the text of the expression plus the equal sign and then the evaluated expression. That sounds kind of complicated, so let's look at an example:

>>> username = 'jdoe'
>>> f'Your {username=}'
"Your username='jdoe'"

This example demonstrates that the text inside of the expression, username= is added to the output followed by the actual value of username in quotes.

f-strings are very powerful and extremely useful. They will simplify your code quite a bit if you use them wisely. You should definitely give them a try.

Let's find out what else you can do with strings!

String Concatenation

Strings also allow concatenation, which is a fancy word for joining two strings into one.

To concatenate strings together, you can use the + sign:

>>> first_string = 'My name is'
>>> second_string = 'Mike'
>>> first_string + second_string
'My name isMike'

Oops! It looks like the strings merged in a weird way because you forgot to add a space to the end of the first_string. You can change it like this:

>>> first_string = 'My name is '
>>> second_string = 'Mike'
>>> first_string + second_string
'My name is Mike'

Another way to merge strings is to use the .join() method. The .join() method accepts an iterable, such as a list, of strings and joins them together.

>>> first_string = 'My name is '
>>> second_string = 'Mike'
>>> ''.join([first_string, second_string])
'My name is Mike'

This will make the strings join right next to each other. You could put something inside of the string that you are joining though:

>>> '***'.join([first_string, second_string])
'My name is ***Mike'

In this case, it will join the first string to *** plus the second string.

More often than not, you can use an f-string rather than concatenation or .join() and the code will be easier to follow.

String Slicing

Slicing in strings works in much the same way that it does for Python lists. Let's take the string "Mike". The letter "M" is at position zero and the letter "e" is at position 3.

If you want to grab characters 0-3, you would use this syntax: my_string[0:4]

What that means is that you want the substring starting at position zero up to but not including position 4.

Here are a few examples:

>>> 'this is a string'[0:4]
'this'
>>> 'this is a string'[:4]
'this'
>>> 'this is a string'[-4:]
'ring'

The first example grabs the first four letters from the string and returns them. If you want to, you can drop the zero as that is the default and use [:4] instead, which is what example two does.

You can also use negative position values. So [-4:] means that you want to start at the end of the string and get the last four letters of the string.

You should play around with slicing on your own and see what other slices you can come up with.

Wrapping Up

Python strings are powerful and quite useful. They can be created using single, double or triple quotes. Strings are objects, so they have methods. You also learned about string concatenation, string slicing and three different methods of string formatting.

The newest flavor of string formatting is the f-string. It is also the most powerful and the currently preferred method for formatting strings.

Related Reading

Copyright © 2024 Mouse Vs Python | Powered by Pythonlibrary