In this tutorial, we’ll discuss how you can create and prepare Dataset and String Function using Python in Power BI. The example I’ll use to execute the process is available on GitHub.
Power BI has enabled Python functionality, making it easier for users to run Python scripts directly in Power BI.
Connecting To A Data Source Using Python
Before we begin, make sure that you have opened your Power BI notebook. First, we need to connect to a data source. This will give us access to a CSV file which we can import to our Power BI notebook. The sample dataset I’ll be working on is the Clinic Analysis Dataset.
You can go to GitHub to navigate to your folder and click on your CSV file. If you don’t have one, there are tons of public repositories with uploaded CSV files that you can access. Once you have your dataset, click on the main folder and you’ll see two files. Click on the CSV file.
Then click View Raw.
After that, navigate to the address bar and copy the CSV address. Then go back to your Power BI notebook.
Under the Home tab, click on Get Data then More to see the Web option and click on it.
Next, paste the CSV address on the URL field and click OK.
Then, click the Transform Data button.
We have successfully connected our data source in our Power BI.
Data Cleaning Or Data Wrangling Using Power BI
Now, we can freely navigate on the dataset. This enables us to perform cleaning procedures needed to get our data in the proper format for better analysis.
Under the column Lab Cost, we’ll see that there are various dollar symbols with a dash mark subject for conversion.
We can’t directly convert these symbols.
That’s because it will cause an error, which we’re trying to avoid.
To replace those, click on Replace Values, then input the dollar symbol on the Value To Find field. Place 0 or your desired value on the Replace With field.
Now we can convert these symbols to our desired value. Keep in mind that we can see our activities being documented in the Applied Steps section.
Data Cleaning or Data Wrangling Using Python in Power BI
To make this cleaning process or data wrangling task applicable to all columns, we’ll use the Python function by utilizing Jupyter Notebook.
First, open your Jupyter Notebook and load your homepage.
Under the Files tab, navigate to the right corner and click on ‘New’ to see the ‘Python 3’ option. Then click on it to create a new Python Notebook.
The first thing we have to do is provide a file name. You can name it whatever you like. In this case, I’ll set the title of the file to ‘Clinic Analysis.’
Now that our Python notebook is open, we can see a cell that will allow us to prepare or execute the code. We need to incorporate the web source we did in Power BI.
We can do this by importing one of the essential libraries, which is Pandas, in our environment and run it.
After connecting to our web source and creating a function native to Panda’s library, we want to create something that can indicate what we’re about to do.
We’ll explore our data and create a heading as an indicator for this activity. To do that, we need to change this from a code to a markdown.
With a pound sign, we can control our font size. If we want our font to be in a small size, we need to type in various pound signs.
On the other hand, if we want a bigger font size, we need to use fewer pound signs.
Additionally, we might want to include a brief explanation of our activity under our header. Let’s say we’ll explore native functions and datasets.
If we’ll run it, it will automatically appear underneath our header.
Exploring the Dataset In Power BI Using Python
On a new cell, we can explore the dataset. Let’s say we want to explore the first five rows of data which is actually the default argument, and we need to make a new variable data.
This will give you access to the first five rows. But remember, you can put any argument inside the parenthesis indicating how many rows of data you want to access. In this example, I want to access ten rows.
However, we’re now having that same issue we had in our Power BI notebook regarding the dollar symbols with a dash mark.
In order to fix this, we need to build our own user-defined function that we can apply to clean the format of our data.
We can start the cleaning process by obtaining more information regarding our data. This will make it easier for us to identify the issues in our dataset.
After running the function, we’re now able to see in-depth information about the dataset—the number of entries, the columns, and the data type. We can also see that we have strings that represent texts within our Python environment.
If we look closely at the column, there are several indentations that we need to fix. I’ll show you the two ways to clean this.
Two Ways Of Cleaning Data Using Python
The first one is by manually fixing the issue. First, gather the information regarding your data columns. In order to do so, type in data.columns in a new code cell.
In a new cell, you want to document the activity of cleaning the spaces in the column headings. So, copy and paste the information in the cell and directly delete the white spaces. Another way is by using a string function that removes white spaces.
After running that, it will immediately eliminate the unnecessary white spaces in our strings. Doing this method provides convenience for users, especially when we have too much data or strings to fix or clean.
Next, we have to save and incorporate our previous code in this cell before we run it.
As a result, we have fixed the issue and the columns on our dataset are in the proper format. Everything is now aligned properly as opposed to having white spaces from before.
To summarize, using the dataset and string function can enhance the way Power BI works and extend the normal capabilities of the tool.
Python is a popular high-level programming language and it has lots of great potential when it comes to getting data using only a few lines of code. Hopefully, this tutorial has given you an understanding of Python in Power BI.
All the best,