Python String Data Type

codeforests
5 min readJul 12, 2020

--

In the previous article, we have discussed about the Python variables including string variables. String is a Python built-in data type which holds a sequence of characters, you will need to use it whenever you need to do any text processing. In this article, I will be sharing with you the various operations you can perform with the Python string data type.

Python string data type

In python, you can define a string variable with single quote, double quotes or triple quotes. And use type() function to verify the data type of your variable. E.g.:

text1 = 'hello \n world!' 
text2 = "bac;def,what$ is"
text3 = """this is also fine"""
print(type(text1), text1)
print(type(text2), text2)
print(type(text3), text3)

You should be able to see the below output, and the data type is showing as “str”.

<class 'str'> hello world! 
<class 'str'> bac;def,what$ is
<class 'str'> this is also fine

Slice Operation

As per the definition for Python string data type, it is a sequence of characters, which means you can access each of the character with the index. (index starts from 0 for the first element)

print(text1[0], text2[1], text3[2])

And you can use slice operation to get a sub set of your string variable:

#get a sub string starting from index 0 and ending at index 5 (exclusive) 
print(text1[0:5])
#get a sub string starting from index 5 and ending at index 7 (exclusive)
print(text3[5:7])
#get a sub string starting from default index 0 and ending at index 4 (exclusive)
print(text3[:4])
#get a sub string starting from index 5 and ending at the end of the string
print(text3[5:])

You can also specify the negative index value to slice the string starting from right to left:

print(text1[-1]) 
print(text3[-3:-1])

The output would be:

! 
in

There is actually a third option — slice step you can use, which you can specify a non-zero integer, e.g:

print(text4[0::2]) 
print(text4[1::2])

The output would be:

aceg 
bdf

Immutable nature

Since we are able to get each individual character from a string, you may wonder if we can re-assign something else to a particular position in the string. e.g.:

text4[0] = 'T' 
#TypeError: 'str' object does not support item assignment

The error shows up because string is immutable and you cannot change anything in it’s original content unless you create a new string:

new_text4 = "T" + text4[1:]

+ and *

And you may noticed different strings can be concatenated by using the “+” in the above example. There is also more operator * can be used in the string.

This will duplicate text3 twice and concatenate them into a single string:

abcdefgthis is also finethis is also fine

Formatting Python string data type

Below are some of the string formatting functions, it’s quite self-explanatory by the function name:

print("lower:", text4.lower()) 
#same as lower()
print("casefold:", text4.casefold())
print("upper:", text4.upper())
print("title:", text4.title())
#same as title print("capitalize:", text4.capitalize()) print("swapcase:", text4.swapcase())
print("center:", text4.center(40, "*"))
print("ljust:", text4.ljust(40))
print("rjust:", text4.rjust(40, "*"))
print("zfill:", text4.zfill(40))
print("strip:", text4.strip("a"))
print("replace:", text4.replace("a", "A"))

Below is the output:

lower: abcdefg 
casefold: abcdefg
upper: ABCDEFG
title: Abcdefg
capitalize: Abcdefg
swapcase: ABCDEFG
center: ****************abcdefg*****************
ljust: abcdefg
rjust: *********************************abcdefg
zfill: 000000000000000000000000000000000abcdefg
strip: bcdefg
replace: Abcdefg

And also there are functions you can use for checking the string format:

print("isalnum:",text4.isalnum()) 
print("isalpha:",text4.isalpha()) print("isdecimal:",text4.isdecimal()) print("isdigit:",text4.isdigit()) print("isnumeric:",text4.isnumeric()) print("isidentifier:",text4.isidentifier()) print("islower:",text4.islower())
print("istitle:",text4.istitle())
print("isupper:",text4.isupper())
print("isspace:",text4.isspace()) print("isprintable:",text4.isprintable())

Output will be something similar to below:

isalnum: True 
isalpha: True
isdecimal: False
isdigit: False
isnumeric: False
isidentifier: True
islower: True
istitle: False
isupper: False
isspace: False
isprintable: True

Comparison operations

You can use relational operators such as ==, >, < to compare the two strings. Python will try to compare letter by letter, and all the uppercase letters come before lowercase, hence you will need to convert your texts into a standard format e.g. all upper or lower case, in order to get the comparison result in alphabetical order.

To check if the string starts/ends with any characters, you can use the startswith and endswith function:

if text3.startswith("this"): 
print("yes, it starts with 'this'")
if text3.endswith("fine"):
print("yes, it ends with 'fine'")

There is no function called contains (sometime people get confused since Java string has this contains method), but you can use the below function — in, find, index or rindex to check if the string has any sub string:

if "this" in text3: 
print("'this' is in text3")
else:
print("not found")
if text3.find("this") > -1:
print("found 'this' from tex3")
else:
print("not found")
if text3.find("this",1, 20) > -1:
print("found 'this' from tex3")
else:
print("'this' is not found from text3, starting from index 1 to 20 ")
if text3.index("this") >-1:
print("found 'this' from tex3, index >=0")
else:
print("not found")
#ValueError: substring not found
#idx = text3.index("this",1, 20)

Both find and index function return the index value of the sub string, the difference between of two function is that, index function will raise ValueError when the sub string is not found, while find will just return -1.

Split & Join texts

A lot times you may need to split the text by certain delimiter, e.g. newlines (\n), ; space etc. You can use the split function to the text into a list. If the delimiter is not found, the split function will return the original text as in a list.

print("split by default deliminator:", text3.split()) 
print("split by s", text3.split('s'))
print("split by ;", text3.split(';'))

The output will be:

split by default deliminator: ['this', 'is', 'also', 'fine'] 
split by s ['thi', ' i', ' al', 'o fine']
split by ; ['this is also fine']

On the other hand, if you have a list of string, you would like to join them into one string, you can do the following:

print("join the words with ';':", ';'.join(text3.split())) print("join the words without space:", ''.join(text3.split()))

And below is the output:

join the words with ';': this;is;also;fine 
join the words without space: thisisalsofine

Count occurrence

The count function can be used for calculating the occurrence of a sub string from the original string, for instance :

print(text3*5) 
print("'is' occurence:',(text3*5).count("is"))

Result will be :

this is also finethis is also finethis is also finethis is also finethis is also fine 'is' occurence:10

Conclusion

With all the above examples provided, we have covered most of the commonly used functions for Python string data type. You may also check through the Python official document to see if there is any additional functions you are interested to know for the Python strings data type.

Originally published at https://www.codeforests.com on July 12, 2020.

--

--

codeforests
codeforests

Written by codeforests

Resources and tutorials for python, data science and automation solutions

No responses yet