Building a Beginner Streamlit Application for Data Tasks

by | Python

Introduction to Streamlit and Installation

Streamlit is an open-source Python library that makes it easy to create and share custom web applications for machine learning and data science. With Streamlit, you can quickly build interactive tools and dashboards. This guide will walk you through installing Streamlit and creating a basic application.

Prerequisites

  • Ensure you have Python installed on your system. You can download it from python.org.

Installation

To get started with Streamlit, you need to install it. Follow the steps below:

  1. Create a Virtual Environment:

    It’s a good practice to create a virtual environment for your projects to manage dependencies.

    python -m venv myenv
    source myenv/bin/activate # On Windows, use `myenv\Scripts\activate`
  2. Install Streamlit:

    Use pip, Python’s package installer, to install Streamlit.

    pip install streamlit

Creating a Basic Streamlit Application

Once Streamlit is installed, you can create a simple application. Follow these steps:

  1. Create a New Python File:

    Create a new file named app.py.

    # app.py
    import streamlit as st

    # Title of the application
    st.title('Basic Streamlit Application')

    # Add a header
    st.header('Welcome to your first Streamlit app')

    # Add some text
    st.write('Streamlit is awesome!')

    # Add a simple data display
    data = {
    'First Column': [1, 2, 3, 4],
    'Second Column': [10, 20, 30, 40]
    }
    st.write(data)

  2. Running the Application:

    To run your Streamlit application, use the Streamlit CLI command:

    streamlit run app.py

    This command will start a local web server. Open your web browser and navigate to the URL provided in the terminal (usually http://localhost:8501).

Exploring the Web Interface

  • Title and Header: You will see the title and header at the top of the page.
  • Text: Below it, the text ‘Streamlit is awesome!’ will be displayed.
  • Data: Finally, you will see a simple table displaying the data.

With these steps, you have successfully set up a basic Streamlit application. In the next sections of this guide, we will explore more advanced features and customization options available in Streamlit.

Setting Up the Development Environment

Prerequisites

Before we proceed, ensure you have completed the following:

  • Installed Streamlit as per the instructions provided in the previous units.
  • Familiarized yourself with the basic concepts of Streamlit from the “Introduction to Streamlit” unit.

Directory Structure

Set up your project directory structure as follows:

my_streamlit_app/
??? app.py
??? data/
?   ??? sample_data.csv
??? requirements.txt

Creating requirements.txt

Ensure requirements.txt exists with the necessary dependencies:

streamlit
pandas
numpy

Developing the Application

app.py

Create and open the app.py file in your project directory. Populate it with the basic structure of a Streamlit app:

import streamlit as st
import pandas as pd
import numpy as np

# Title of the application
st.title("Basic Streamlit Application")

# Load data
def load_data():
    data = pd.read_csv('data/sample_data.csv')
    return data

# Main function
def main():
    data = load_data()
    
    # Display a header
    st.header("Data Overview")
    
    # Display the dataframe
    st.write(data)
    
    # Statistics
    st.subheader("Statistics")
    st.write(data.describe())
    
    # Add more functionality as needed for your data-related tasks

if __name__ == "__main__":
    main()

Sample Data

Add a CSV file (sample_data.csv) inside the data directory to work with. Ensure it contains some sample data like this:

index,value
0,10
1,20
2,30
3,40
4,50
5,60

Running the Application

With everything set up, navigate to your project directory in your command-line interface and run the following command:

streamlit run app.py

Your web browser should open the basic Streamlit application, displaying your dataset, basic statistics, and the additional functionalities you may have included.


At this point, your development environment should be properly set up, allowing you to expand and customize your Streamlit app as needed.

Loading and Displaying Data in Streamlit

Step 1: Import Required Libraries

import streamlit as st
import pandas as pd

Step 2: Load Data

@st.cache
def load_data(filepath):
    data = pd.read_csv(filepath)
    return data

uploaded_file = st.file_uploader("Choose a CSV file", type="csv")
if uploaded_file is not None:
    data = load_data(uploaded_file)
    st.write("Data Loaded Successfully!")

Step 3: Display Data

if uploaded_file is not None:
    st.dataframe(data)

Step 4: Putting It All Together

import streamlit as st
import pandas as pd

@st.cache
def load_data(filepath):
    data = pd.read_csv(filepath)
    return data

uploaded_file = st.file_uploader("Choose a CSV file", type="csv")
if uploaded_file is not None:
    data = load_data(uploaded_file)
    st.write("Data Loaded Successfully!")
    st.dataframe(data)

Explanation

  1. Import Required Libraries: The Streamlit (st) and pandas (pd) libraries are imported for creating the application and handling data, respectively.

  2. Load Data:

    • The load_data function loads the data from a CSV file using pandas.
    • The @st.cache decorator ensures data is cached for quick reloads without re-running the loading process.
    • The file_uploader widget allows users to upload a CSV file.
    • If a file is uploaded, the load_data function is called to load the data.
  3. Display Data:

    • The dataframe method of Streamlit displays the data in a tabular format.
    • This is executed only if an uploaded file is successfully loaded.
  4. Putting It All Together:

    • Combine all steps in a single script.
    • This script, when run in a Streamlit environment, will present a file uploader, load the file into a pandas DataFrame when provided, and display it within the Streamlit app.

Copy this code and run it in your Streamlit environment to load and display your data interactively.

Basic Data Visualization in Streamlit

Streamlit allows you to create interactive and informative visualizations with ease. Below is a practical implementation for adding basic data visualizations to your Streamlit application. This assumes that you have already loaded your data and that it is available as a Pandas DataFrame named df.

Implementing Basic Data Visualizations in Streamlit

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Assuming 'df' is already loaded in previous steps
# Sample DataFrame for demonstration
df = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Values': [23, 45, 56, 78]
})

st.title("Basic Data Visualization")
st.write("This section will display various basic data visualizations using Matplotlib and Seaborn.")

# Bar Chart
st.subheader("Bar Chart")
st.bar_chart(df[['Category', 'Values']].set_index('Category'))

# Line Chart
st.subheader("Line Chart")
st.line_chart(df[['Category', 'Values']].set_index('Category'))

# Area Chart
st.subheader("Area Chart")
st.area_chart(df[['Category', 'Values']].set_index('Category'))

# Matplotlib Figure - Scatter Plot
st.subheader("Scatter Plot")
plt.figure(figsize=(10, 6))
plt.scatter(df['Category'], df['Values'], color='red')
plt.title('Scatter Plot')
plt.xlabel('Category')
plt.ylabel('Values')
st.pyplot(plt)

# Seaborn Plot
st.subheader("Seaborn Plot")
fig, ax = plt.subplots()
sns.barplot(x='Category', y='Values', data=df, ax=ax)
ax.set(title='Seaborn Barplot', xlabel='Category', ylabel='Values')
st.pyplot(fig)

Explanation

  1. Bar Chart: Uses Streamlit’s built-in st.bar_chart() to display a bar chart with the data.
  2. Line Chart: Uses Streamlit’s built-in st.line_chart() to display a line chart with the data.
  3. Area Chart: Uses Streamlit’s built-in st.area_chart() to display an area chart with the data.
  4. Scatter Plot: Uses Matplotlib to create a scatter plot. The plot is displayed using st.pyplot().
  5. Seaborn Plot: Uses Seaborn to create a bar plot, and st.pyplot() is used to display the figure created by Seaborn.

With this existing code, you can enhance your Streamlit application by providing interactive and insightful data visualizations. Adapt the data and visualizations according to your project requirements.

Interactive Widgets in Streamlit

In this section, we will add interactive widgets to your Streamlit application. These widgets will help your users to interact with the data dynamically. We will cover a few essential interactive widgets, including sliders, select boxes, and text inputs, to enhance user engagement.

Example Implementation

import streamlit as st
import pandas as pd
import numpy as np


# Load sample data (assuming it's a DataFrame)
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Value': [10, 20, 30, 40]
})

# Display DataFrame
st.write("Dataset:")
st.dataframe(data)

# Interactive Widget 1: Slider
st.write("Adjust the sliders to filter data based on value.")

min_val = st.slider('Select minimum value', min_value=int(data['Value'].min()), max_value=int(data['Value'].max()), value=int(data['Value'].min()))
max_val = st.slider('Select maximum value', min_value=int(data['Value'].min()), max_value=int(data['Value'].max()), value=int(data['Value'].max()))

filtered_data = data[(data['Value'] >= min_val) & (data['Value'] <= max_val)]
st.write("Filtered Data:")
st.dataframe(filtered_data)

# Interactive Widget 2: Select Box
st.write("Choose a category to filter the data.")

category = st.selectbox('Select category', options=data['Category'].unique())
category_filtered_data = data[data['Category'] == category]
st.write("Category Filtered Data:")
st.dataframe(category_filtered_data)

# Interactive Widget 3: Text Input
st.write("Input a custom message.")

custom_message = st.text_input('Enter message', 'Hello, Streamlit!')
st.write(f"Your message: {custom_message}")

# Additional example for combining widgets
st.write("Combine Slider and Select Box to filter data.")

combined_min_val = st.slider('Select minimum value for combined filter', min_value=int(data['Value'].min()), max_value=int(data['Value'].max()), value=int(data['Value'].min()))
combined_max_val = st.slider('Select maximum value for combined filter', min_value=int(data['Value'].min()), max_value=int(data['Value'].max()), value=int(data['Value'].max()))
combined_category = st.selectbox('Select category for combined filter', options=data['Category'].unique())

combined_filtered_data = data[(data['Value'] >= combined_min_val) & (data['Value'] <= combined_max_val) & (data['Category'] == combined_category)]
st.write("Combined Filtered Data:")
st.dataframe(combined_filtered_data)

Explanation

  1. Loading Data:

    • A sample DataFrame is created for demonstration purposes.
    • The data is displayed using st.dataframe().
  2. Sliders:

    • Two sliders are used to select minimum and maximum values.
    • The data is filtered based on these values.
  3. Select Box:

    • A select box is used to filter the data based on category.
    • The filtered data is displayed accordingly.
  4. Text Input:

    • A text input widget is provided for users to input a custom message which is then displayed.
  5. Combining Widgets:

    • A combined filtering approach leveraging both sliders and a select box to filter data is demonstrated.

This implementation incorporates interactive components to make the Streamlit application dynamic and user-friendly.

Advanced Data Visualization Techniques

For this part of the project, we will dive into creating advanced data visualizations within a Streamlit application using more complex plots and interactivity.

Streamlit Code for Advanced Visualizations

import streamlit as st
import pandas as pd
import numpy as np
import altair as alt

# Load the data
df = pd.read_csv('your_data.csv')

# Create a sidebar for user inputs
st.sidebar.header('User Inputs')

# Example user input: Selecting a column for visualization
feature = st.sidebar.selectbox('Select a feature for visualization', df.columns)

# Plot 1: Histogram with interactive bin size
bins = st.sidebar.slider('Select number of bins for histogram', min_value=10, max_value=100, value=30, step=10)
hist = alt.Chart(df).mark_bar().encode(
    alt.X(f'{feature}:Q', bin=alt.Bin(maxbins=bins)),
    y='count()'
).properties(
    title=f'Histogram of {feature}'
)
st.altair_chart(hist, use_container_width=True)

# Plot 2: Scatter plot with color grouping
color_feature = st.sidebar.selectbox('Select a feature for color grouping in scatter plot', df.columns, index=1)
scatter = alt.Chart(df).mark_circle(size=60).encode(
    x=alt.X(f'{feature}:Q'),
    y=alt.Y(f'{color_feature}:Q'),
    color=f'{color_feature}:N',
    tooltip=[feature, color_feature]
).interactive().properties(
    title=f'Scatter plot of {feature} vs {color_feature}'
)
st.altair_chart(scatter, use_container_width=True)

# Plot 3: Line chart with interactive date range
date_feature = st.sidebar.selectbox('Select a date feature for line chart', df.columns[df.dtypes == 'datetime64[ns]'])
date_range = st.sidebar.slider('Select date range', min_value=df[date_feature].min(), max_value=df[date_feature].max(), value=(df[date_feature].min(), df[date_feature].max()))
filtered_df = df[(df[date_feature] >= date_range[0]) & (df[date_feature] <= date_range[1])]

line_chart = alt.Chart(filtered_df).mark_line().encode(
    x=alt.X(f'{date_feature}:T'),
    y=f'{feature}:Q'
).properties(
    title=f'Line Chart of {feature} over time'
)
st.altair_chart(line_chart, use_container_width=True)

Key Features Showcased:

  1. Histogram with Interactive Bin Size:
    Adjust the number of bins via a slider to see how it affects the distribution of the selected feature.

  2. Scatter Plot with Interactive Color Grouping:
    Select various features to be plotted on the x and y axes, and choose a feature for color grouping. The plot updates based on your selections.

  3. Line Chart with Date Range Filter:
    Select a date range to filter the data shown in the line chart, allowing for dynamic analysis of trends over time.


Running the Application

To display this advanced visualization Streamlit app, save the above code into a file named app.py and run it using the Streamlit command:

streamlit run app.py

This command will launch your browser and show the advanced visualization techniques using your specific dataset.

Adding Interactivity to Visualizations in Streamlit

Import Libraries

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Load Data

# Placeholder for data loading - replace with actual data loading
df = pd.read_csv('your_dataset.csv')

Interactive Visualization

Let’s create an interactive scatter plot which allows users to select different variables for the x and y axes.

# Sidebar for user input
st.sidebar.header('User Input Features')

# Get the column names for user selection
columns = df.columns.tolist()

x_axis = st.sidebar.selectbox('Select X-axis variable', columns)
y_axis = st.sidebar.selectbox('Select Y-axis variable', columns)

# Plotting the interactive scatter plot
fig, ax = plt.subplots()
sns.scatterplot(data=df, x=x_axis, y=y_axis, ax=ax)
ax.set_title(f'Scatter Plot of {x_axis} vs {y_axis}')

# Display the plot
st.pyplot(fig)

Add More Interactivity – Filter Data

We’ll add sliders to filter the data based on a chosen numerical column.

# Sidebar for filtering data
filter_column = st.sidebar.selectbox('Select column to filter', df.select_dtypes(include=['float64', 'int64']).columns)
min_value = float(df[filter_column].min())
max_value = float(df[filter_column].max())
filter_values = st.sidebar.slider(f'Select range of {filter_column}', min_value, max_value, (min_value, max_value))

# Filter the dataframe based on the user's selection
filtered_df = df[(df[filter_column] >= filter_values[0]) & (df[filter_column] <= filter_values[1])]

# Create an updated scatter plot based on filtered data
fig, ax = plt.subplots()
sns.scatterplot(data=filtered_df, x=x_axis, y=y_axis, ax=ax)
ax.set_title(f'Scatter Plot of {x_axis} vs {y_axis} (Filtered)')

# Display the updated plot
st.pyplot(fig)

Conclusion

These steps demonstrate how to add interactivity to visualizations in Streamlit using widgets like selectbox and slider, allowing users to dynamically choose variables for the axes and filter the dataset. You can extend these techniques to other types of plots and interactivity features.

Building a Complete Data Dashboard with Streamlit

To build a complete data dashboard using Streamlit, you will combine the functionalities of data loading, visualization, and user interactivity to present a cohesive and interactive data analysis tool. Below is the practical implementation with Streamlit:

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load and Display Data
@st.cache
def load_data(filepath):
    return pd.read_csv(filepath)

data = load_data('path/to/your/data.csv')
st.title("Data Dashboard")
st.write("A complete data dashboard using Streamlit")

st.header("Data Preview")
st.dataframe(data.head())

# Basic Data Visualization
st.header("Basic Data Visualization")
st.subheader("Choose feature for histogram")
feature = st.selectbox("Select feature", data.columns)
fig, ax = plt.subplots()
sns.histplot(data[feature], kde=True, ax=ax)
st.pyplot(fig)

# Advanced Data Visualization
st.header("Advanced Data Visualization")
st.subheader("Correlation Heatmap")
if st.checkbox("Show Heatmap"):
    corr = data.corr()
    fig, ax = plt.subplots()
    sns.heatmap(corr, annot=True, ax=ax, cmap='coolwarm')
    st.pyplot(fig)

# Adding Interactivity to Visualizations
st.header("Interactive Visualizations")
st.subheader("Scatter Plot with Filters")

x_axis = st.selectbox("Choose X-axis", data.columns)
y_axis = st.selectbox("Choose Y-axis", data.columns)
hue = st.selectbox("Choose hue", data.columns.insert(0, None))
st.write("Choose filters for the data")
min_value = st.slider("Min Value", min_value=int(data[x_axis].min()), max_value=int(data[x_axis].max()), value=int(data[x_axis].min()))
max_value = st.slider("Max Value", min_value=int(data[x_axis].min()), max_value=int(data[x_axis].max()), value=int(data[x_axis].max()))

filtered_data = data[(data[x_axis] >= min_value) & (data[x_axis] <= max_value)]

fig, ax = plt.subplots()
sns.scatterplot(data=filtered_data, x=x_axis, y=y_axis, hue=hue, ax=ax)
st.pyplot(fig)

# Summary Statistics and Metrics
st.header("Summary Statistics")
summary_metrics = data.describe().T
st.write(summary_metrics)

if st.checkbox("Show metrics for a specific feature"):
    metric = st.selectbox("Select feature for metrics", data.columns)
    st.write(data[metric].describe())

# Adding a Download Button
st.header("Download Filtered Data")
csv = filtered_data.to_csv(index=False)
st.download_button(
    label="Download filtered data as CSV",
    data=csv,
    file_name='filtered_data.csv',
    mime='text/csv',
)

# Run the Streamlit app
# To run, use command in terminal: streamlit run <name_of_this_script.py>

This practical implementation integrates various essential components to create a usable and interactive dashboard in Streamlit. It provides functionalities for data preview, basic and advanced visualization, interactivity to filters and plots, and summary statistics. The code also includes a download button for users to download the filtered dataset.

Testing and Debugging Your Streamlit App

When working on a Streamlit application, effective testing and debugging are vital to ensure the app’s functionality and performance. Here are some practical steps and code examples to help you test and debug your Streamlit app.

1. Unit Testing with unittest

To ensure that individual pieces of your Streamlit app function correctly, you can use Python’s built-in unittest framework.

Example: Unit Testing a Data Processing Function

import unittest

def process_data(data):
    # Simple function to demonstrate testing
    return [i**2 for i in data]

class TestProcessData(unittest.TestCase):
    def test_process_data(self):
        self.assertEqual(process_data([1, 2, 3]), [1, 4, 9])
        self.assertEqual(process_data([]), [])

if __name__ == '__main__':
    unittest.main()

Running Unit Tests

To execute the tests, save the code to a file called test_example.py and run:

python test_example.py

2. Adding Debug Statements

Inserting debug statements within your application helps you understand the flow and identify issues.

Example: Using Debug Statements in Streamlit

import streamlit as st

def main():
    st.title("Debugging Example App")

    data = [1, 2, 3, 4]
    st.write(f"Initial data: {data}")  # Debug statement
    st.write(f"Processed data: {process_data(data)}")  # Debug statement

def process_data(data):
    st.write(f"Processing data: {data}")  # Debug statement
    return [i**2 for i in data]

if __name__ == "__main__":
    main()

3. Using Logging

Implement logging to capture runtime information persistently.

Example: Implementing Logging

import streamlit as st
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def main():
    st.title("Logging Example App")

    data = [1, 2, 3, 4]
    logger.info(f"Initial data: {data}")  # Logger statement
    processed_data = process_data(data)
    logger.info(f"Processed data: {processed_data}")  # Logger statement

    st.write(f"Processed data: {processed_data}")

def process_data(data):
    logger.info(f"Processing data: {data}")  # Logger statement
    return [i**2 for i in data]

if __name__ == "__main__":
    main()

4. Using st.cache for Debugging Long Computations

Streamlit’s st.cache helps optimize performance but can also introduce bugs if not used correctly.

Example: Implementing st.cache with Debugging

import streamlit as st
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@st.cache
def process_data(data):
    logger.info("Running cached process_data function")  # Logger statement
    return [i**2 for i in data]

def main():
    st.title("Cache Example App")

    data = [1, 2, 3, 4]
    logger.info(f"Initial data: {data}")
    processed_data = process_data(data)
    logger.info(f"Processed data: {processed_data}")

    st.write(f"Processed data: {processed_data}")

if __name__ == "__main__":
    main()

5. Using Streamlit’s st.write for Debugging

st.write can be used throughout your Streamlit app for simple, quick debugging.

Example: Debugging with st.write

import streamlit as st

def main():
    st.title("Simple Debugging App")

    data = [1, 2, 3, 4]
    st.write("Initial data:", data)  # Debug output
    processed_data = process_data(data)
    st.write("Processed data:", processed_data)  # Debug output

def process_data(data):
    st.write("Processing data...")  # Debug output
    return [i**2 for i in data]

if __name__ == "__main__":
    main()

Applying these practices will help you test and debug your Streamlit application effectively, ensuring it works as intended and is free of bugs.

Deploying Your Streamlit Application

To deploy your Streamlit application, you can use Streamlit sharing or any other cloud platform like Heroku. Below is a practical guide using Streamlit sharing and Heroku for deployment.

Streamlit Sharing

  1. Sign Up for Streamlit Sharing

    Head to Streamlit Sharing and request an invite. Streamlit Sharing is a free platform provided by Streamlit for hosting applications.

  2. Prepare Your Repository

    Ensure your project is hosted on GitHub, and it should include:


    • app.py or the main Python file for your Streamlit app.

    • requirements.txt listing all your dependencies.

    Example of requirements.txt:

    streamlit
    pandas
    matplotlib
  3. Deploy the App

    a. Go to your Streamlit Sharing account.

    b. Click “New App” and link your GitHub repository.

    c. Follow the instructions on-screen to fill in details like branch and the main file path.

  4. Run the App

    After configuration, hit ‘Deploy’ and your app will be live with a shareable link provided by Streamlit.


Deploy using Heroku

  1. Prepare Your Project

    Ensure your project includes:

    • app.py or the main Python file for your Streamlit app.

    • requirements.txt listing all your dependencies.

    • Procfile to specify the command to run.


    Example of requirements.txt:

    streamlit
    pandas
    matplotlib

    Example of Procfile:

    web: sh setup.sh && streamlit run app.py

  2. Create a setup.sh

    setup.sh script ensures the runtime environment is ready. Example:

    mkdir -p ~/.streamlit/
    echo "\
    [server]\n\
    headless = true\n\
    port = $PORT\n\
    enableCORS = false\n\
    \n\
    " > ~/.streamlit/config.toml

  3. Deploy to Heroku

    a. Install Heroku CLI and login:

    heroku login

    b. Create a new Heroku app:

    heroku create your-app-name

    c. Push your code to Heroku:

    git add .
    git commit -m "Initial commit"
    git push heroku master

  4. Scale the App

    Make sure at least one dyno is running:

    heroku ps:scale web=1

  5. Open Your Deployed App

    Open your app using:

    heroku open

Your Streamlit app should now be live using the provided Heroku domain or Streamlit Sharing link.

Related Posts