Discovering Altair - bar plots

Mar 20, 2021

The first day of spring seems like a good time to finally launch a series ‘Things I learned this week’:) Today I’d like to share a way to do a stacked and grouped bar plot with Altair.

This post can be viewed as a jupyter notebook - link.

What & why:

Recently we were investigating the solvent effects on the NMR shielding of transition metal nuclei in multiple complexes, for which we used few approximations within few computational techniques. In one word - a lot of information to track.

It is not easy to find a recipe for effective visualization in this case, especially if one wants to capture all important data in one figure. Here I show how Altair can be used to plot the contributions to the solvent effects for the whole series and two calculation methods.

Data description:

df1 and df2 collect the data from two calculation methods (read from data1.csv and data2.csv files)
$v_1$ and $v_{ref}$ are two boundary reference values:
- $v_1$ - corresponds to the property value with no effect included
- $v_{ref}$ - corresponds to the property value with a total effect included
- therefore this effect is estimated as: $v = v_{ref} - v_1$
v2 and v3 are approximations to vref. We then define:
- $\Delta(1) = v_2 - v_1$
- $\Delta(2) = v_3 - v_2$
- $\Delta(3) = v_{ref} - v_3$
  In patricular, $\Delta(3)$ can be interpreted as the portion of the effect that is not described by approximations $v_2$ and $v_3$.
I use fake data in this exercise

import pandas as pd
import altair as alt

def prep_data(df,name):      
    df['name'] = name
    df_te=df[['mol','name']].copy()
    
    df['delta1']  = df['v2']-df['v1']
    df['delta2']  = df['v3']-df['v2']
    df['delta3']  = df['vref']-df['v3']
    
    df_te['total_effect'] = df['vref']-df['v1']
    
    df = df.drop(['v1','v2','v3','vref'], axis=1)
    df = df.melt(id_vars =['mol', 'name'])
    
    df_te = df_te.melt(id_vars =['mol', 'name'])
    
    return df,df_te

df1=pd.read_csv('data1.csv')

Let’s have a look at the dataframe:

df1

	mol	v1	v2	v3	vref
0	Cr	-2944.2912	-2813.8701	-2796.4468	-2863.6800
1	Mn	-4259.8771	-4221.8368	-4215.8657	-4279.8494
2	Co	-6125.6798	-4967.6189	-4963.6009	-4993.8201
3	Zn	1985.4178	1947.7289	1946.8668	1939.1123
4	Mo	-403.3011	-335.5684	-315.6369	-383.6767
5	Tc	-1264.2636	-1241.5628	-1228.3957	-1254.4530
6	Ru	-762.8891	69.6719	204.9638	74.8138
7	Pd	-1768.4043	-1550.3824	-1533.2350	-1477.3337
8	Ag	4589.0627	4408.0483	4408.4061	4377.6442
9	W	4468.8709	4547.5992	4567.8561	4436.9428
10	Re	3414.5834	3435.8192	3446.8677	3390.2474
11	Pt	2557.4652	3066.9424	3125.3993	2884.9378

df1,df1_te=prep_data(df1,'set1')

df2=pd.read_csv('data2.csv')

df2, df2_te=prep_data(df2,'set2')

df_plot = pd.concat([df1.set_index('mol'),
                     df2.set_index('mol')]).reset_index()
df_plot_te = pd.concat([df1_te.set_index('mol'),
                     df2_te.set_index('mol')]).reset_index()

df_plot['variable'].replace({'delta1': '\u0394'+'(1)',
                             'delta2': '\u0394'+'(2)',
                             'delta3': '\u0394'+'(3)',
                             'delta4': '\u0394'+'(4)'
                            },inplace=True)

df_plot_all = pd.merge(df_plot, df_plot_te, on=['mol','name'])

order_mol=['Cr', 'Mn', 'Co', 'Zn', 'Mo', 'Tc', 'Ru', 'Pd', 'Ag', 'W', 'Re', 'Pt']
order_where=['set1','set2']


bars=alt.Chart(df_plot_all).mark_bar(size=15).encode(     

    # which field to group columns on
    x=alt.X('name:O',
            axis=alt.Axis(grid=True,labelFontSize=8),
            sort=order_where,
            title=None),

    # which field to use as Y values and how to calculate
    y=alt.Y('value_x:Q',
            axis=alt.Axis(grid=True,title=None)),

    # which field to color by & legend
    color=alt.Color('variable_x',
                    scale=alt.Scale(range=['#4381d1', '#47c488', '#ff6f69']),
                    legend=alt.Legend(title="Contributions",
                                      orient="right",
                                      direction="horizontal",
                                      offset=-200,
                                      titleFontSize=16,
                                      labelFontSize=14)),
                   
    # how to order the data on bars
    order=alt.Order('variable_x:Q', sort='ascending'))


# use separate marks for the 'total effect'
rules = alt.Chart(df_plot_all).mark_tick(color='black', 
                                         thickness=1.5,
                                         size=15
                                        ).encode(x=alt.X('name:O',axis=alt.Axis(grid=True,title=None)),
                                                 y=alt.Y('value_y:Q',axis=alt.Axis(grid=True,title=None)))


# combine all together
alt.layer(bars,rules).properties(height=450,width=50).facet(
   column=alt.Column('mol',
                     sort=order_mol,
                     header=alt.Header(title='Contributions to solvent shifts on NMR shieldings',
                                       orient='bottom',
                                       titleFontSize=20,
                                       labelFontSize=14,
                                       labelBaseline='line-top',
                                       labelAlign='center',
                                       labelAnchor='middle'))).resolve_scale(x='independent').configure_view(strokeOpacity=0)

Final note

The total effect is additionally marked by horizontal black lines. It would be better to add these horizontal black lines to the legend, but from what I saw it is not straightfoward to do at this point.

Thanks:

Scripts used in this notebook are a combination of advice found (mostly) on stackoverlow, which I lost track of… So big thanks to all Altair experts out there!