Discovering Altair - bar plots

Mar 20, 2021

The first day of spring seems like a good time to finally launch a series ‘Things I learned this week’:) Today I’d like to share a way to do a stacked and grouped bar plot with Altair.

This post can be viewed as a jupyter notebook - link.

What & why:

Recently we were investigating the solvent effects on the NMR shielding of transition metal nuclei in multiple complexes, for which we used few approximations within few computational techniques. In one word - a lot of information to track.

It is not easy to find a recipe for effective visualization in this case, especially if one wants to capture all important data in one figure. Here I show how Altair can be used to plot the contributions to the solvent effects for the whole series and two calculation methods.

Data description:

  • df1 and df2 collect the data from two calculation methods (read from data1.csv and data2.csv files)
  • $v_1$ and $v_{ref}$ are two boundary reference values:
    • $v_1$ - corresponds to the property value with no effect included
    • $v_{ref}$ - corresponds to the property value with a total effect included
    • therefore this effect is estimated as: $v = v_{ref} - v_1$
  • v2 and v3 are approximations to vref. We then define:
    • $\Delta(1) = v_2 - v_1$
    • $\Delta(2) = v_3 - v_2$
    • $\Delta(3) = v_{ref} - v_3$
      In patricular, $\Delta(3)$ can be interpreted as the portion of the effect that is not described by approximations $v_2$ and $v_3$.
  • I use fake data in this exercise
import pandas as pd
import altair as alt
def prep_data(df,name):      
    df['name'] = name
    df_te=df[['mol','name']].copy()
    
    df['delta1']  = df['v2']-df['v1']
    df['delta2']  = df['v3']-df['v2']
    df['delta3']  = df['vref']-df['v3']
    
    df_te['total_effect'] = df['vref']-df['v1']
    
    df = df.drop(['v1','v2','v3','vref'], axis=1)
    df = df.melt(id_vars =['mol', 'name'])
    
    df_te = df_te.melt(id_vars =['mol', 'name'])
    
    return df,df_te
df1=pd.read_csv('data1.csv')

Let’s have a look at the dataframe:

df1
molv1v2v3vref
0Cr-2944.2912-2813.8701-2796.4468-2863.6800
1Mn-4259.8771-4221.8368-4215.8657-4279.8494
2Co-6125.6798-4967.6189-4963.6009-4993.8201
3Zn1985.41781947.72891946.86681939.1123
4Mo-403.3011-335.5684-315.6369-383.6767
5Tc-1264.2636-1241.5628-1228.3957-1254.4530
6Ru-762.889169.6719204.963874.8138
7Pd-1768.4043-1550.3824-1533.2350-1477.3337
8Ag4589.06274408.04834408.40614377.6442
9W4468.87094547.59924567.85614436.9428
10Re3414.58343435.81923446.86773390.2474
11Pt2557.46523066.94243125.39932884.9378
df1,df1_te=prep_data(df1,'set1')
df2=pd.read_csv('data2.csv')
df2, df2_te=prep_data(df2,'set2')
df_plot = pd.concat([df1.set_index('mol'),
                     df2.set_index('mol')]).reset_index()
df_plot_te = pd.concat([df1_te.set_index('mol'),
                     df2_te.set_index('mol')]).reset_index()
df_plot['variable'].replace({'delta1': '\u0394'+'(1)',
                             'delta2': '\u0394'+'(2)',
                             'delta3': '\u0394'+'(3)',
                             'delta4': '\u0394'+'(4)'
                            },inplace=True)
df_plot_all = pd.merge(df_plot, df_plot_te, on=['mol','name'])
order_mol=['Cr', 'Mn', 'Co', 'Zn', 'Mo', 'Tc', 'Ru', 'Pd', 'Ag', 'W', 'Re', 'Pt']
order_where=['set1','set2']


bars=alt.Chart(df_plot_all).mark_bar(size=15).encode(     

    # which field to group columns on
    x=alt.X('name:O',
            axis=alt.Axis(grid=True,labelFontSize=8),
            sort=order_where,
            title=None),

    # which field to use as Y values and how to calculate
    y=alt.Y('value_x:Q',
            axis=alt.Axis(grid=True,title=None)),

    # which field to color by & legend
    color=alt.Color('variable_x',
                    scale=alt.Scale(range=['#4381d1', '#47c488', '#ff6f69']),
                    legend=alt.Legend(title="Contributions",
                                      orient="right",
                                      direction="horizontal",
                                      offset=-200,
                                      titleFontSize=16,
                                      labelFontSize=14)),
                   
    # how to order the data on bars
    order=alt.Order('variable_x:Q', sort='ascending'))


# use separate marks for the 'total effect'
rules = alt.Chart(df_plot_all).mark_tick(color='black', 
                                         thickness=1.5,
                                         size=15
                                        ).encode(x=alt.X('name:O',axis=alt.Axis(grid=True,title=None)),
                                                 y=alt.Y('value_y:Q',axis=alt.Axis(grid=True,title=None)))


# combine all together
alt.layer(bars,rules).properties(height=450,width=50).facet(
   column=alt.Column('mol',
                     sort=order_mol,
                     header=alt.Header(title='Contributions to solvent shifts on NMR shieldings',
                                       orient='bottom',
                                       titleFontSize=20,
                                       labelFontSize=14,
                                       labelBaseline='line-top',
                                       labelAlign='center',
                                       labelAnchor='middle'))).resolve_scale(x='independent').configure_view(strokeOpacity=0)

Final note

The total effect is additionally marked by horizontal black lines. It would be better to add these horizontal black lines to the legend, but from what I saw it is not straightfoward to do at this point.

Thanks:

Scripts used in this notebook are a combination of advice found (mostly) on stackoverlow, which I lost track of… So big thanks to all Altair experts out there!