Discovering Altair - bar plots
Mar 20, 2021
The first day of spring seems like a good time to finally launch a series ‘Things I learned this week’:) Today I’d like to share a way to do a stacked and grouped bar plot with Altair.
This post can be viewed as a jupyter notebook - link.
What & why:
Recently we were investigating the solvent effects on the NMR shielding of transition metal nuclei in multiple complexes, for which we used few approximations within few computational techniques. In one word - a lot of information to track.
It is not easy to find a recipe for effective visualization in this case, especially if one wants to capture all important data in one figure. Here I show how Altair can be used to plot the contributions to the solvent effects for the whole series and two calculation methods.
Data description:
- df1 and df2 collect the data from two calculation methods (read from data1.csv and data2.csv files)
- $v_1$ and $v_{ref}$ are two boundary reference values:
- $v_1$ - corresponds to the property value with no effect included
- $v_{ref}$ - corresponds to the property value with a total effect included
- therefore this effect is estimated as: $v = v_{ref} - v_1$
- v2 and v3 are approximations to vref. We then define:
- $\Delta(1) = v_2 - v_1$
- $\Delta(2) = v_3 - v_2$
- $\Delta(3) = v_{ref} - v_3$
In patricular, $\Delta(3)$ can be interpreted as the portion of the effect that is not described by approximations $v_2$ and $v_3$.
- I use fake data in this exercise
import pandas as pd
import altair as alt
def prep_data(df,name):
df['name'] = name
df_te=df[['mol','name']].copy()
df['delta1'] = df['v2']-df['v1']
df['delta2'] = df['v3']-df['v2']
df['delta3'] = df['vref']-df['v3']
df_te['total_effect'] = df['vref']-df['v1']
df = df.drop(['v1','v2','v3','vref'], axis=1)
df = df.melt(id_vars =['mol', 'name'])
df_te = df_te.melt(id_vars =['mol', 'name'])
return df,df_te
df1=pd.read_csv('data1.csv')
Let’s have a look at the dataframe:
df1
mol | v1 | v2 | v3 | vref | |
---|---|---|---|---|---|
0 | Cr | -2944.2912 | -2813.8701 | -2796.4468 | -2863.6800 |
1 | Mn | -4259.8771 | -4221.8368 | -4215.8657 | -4279.8494 |
2 | Co | -6125.6798 | -4967.6189 | -4963.6009 | -4993.8201 |
3 | Zn | 1985.4178 | 1947.7289 | 1946.8668 | 1939.1123 |
4 | Mo | -403.3011 | -335.5684 | -315.6369 | -383.6767 |
5 | Tc | -1264.2636 | -1241.5628 | -1228.3957 | -1254.4530 |
6 | Ru | -762.8891 | 69.6719 | 204.9638 | 74.8138 |
7 | Pd | -1768.4043 | -1550.3824 | -1533.2350 | -1477.3337 |
8 | Ag | 4589.0627 | 4408.0483 | 4408.4061 | 4377.6442 |
9 | W | 4468.8709 | 4547.5992 | 4567.8561 | 4436.9428 |
10 | Re | 3414.5834 | 3435.8192 | 3446.8677 | 3390.2474 |
11 | Pt | 2557.4652 | 3066.9424 | 3125.3993 | 2884.9378 |
df1,df1_te=prep_data(df1,'set1')
df2=pd.read_csv('data2.csv')
df2, df2_te=prep_data(df2,'set2')
df_plot = pd.concat([df1.set_index('mol'),
df2.set_index('mol')]).reset_index()
df_plot_te = pd.concat([df1_te.set_index('mol'),
df2_te.set_index('mol')]).reset_index()
df_plot['variable'].replace({'delta1': '\u0394'+'(1)',
'delta2': '\u0394'+'(2)',
'delta3': '\u0394'+'(3)',
'delta4': '\u0394'+'(4)'
},inplace=True)
df_plot_all = pd.merge(df_plot, df_plot_te, on=['mol','name'])
order_mol=['Cr', 'Mn', 'Co', 'Zn', 'Mo', 'Tc', 'Ru', 'Pd', 'Ag', 'W', 'Re', 'Pt']
order_where=['set1','set2']
bars=alt.Chart(df_plot_all).mark_bar(size=15).encode(
# which field to group columns on
x=alt.X('name:O',
axis=alt.Axis(grid=True,labelFontSize=8),
sort=order_where,
title=None),
# which field to use as Y values and how to calculate
y=alt.Y('value_x:Q',
axis=alt.Axis(grid=True,title=None)),
# which field to color by & legend
color=alt.Color('variable_x',
scale=alt.Scale(range=['#4381d1', '#47c488', '#ff6f69']),
legend=alt.Legend(title="Contributions",
orient="right",
direction="horizontal",
offset=-200,
titleFontSize=16,
labelFontSize=14)),
# how to order the data on bars
order=alt.Order('variable_x:Q', sort='ascending'))
# use separate marks for the 'total effect'
rules = alt.Chart(df_plot_all).mark_tick(color='black',
thickness=1.5,
size=15
).encode(x=alt.X('name:O',axis=alt.Axis(grid=True,title=None)),
y=alt.Y('value_y:Q',axis=alt.Axis(grid=True,title=None)))
# combine all together
alt.layer(bars,rules).properties(height=450,width=50).facet(
column=alt.Column('mol',
sort=order_mol,
header=alt.Header(title='Contributions to solvent shifts on NMR shieldings',
orient='bottom',
titleFontSize=20,
labelFontSize=14,
labelBaseline='line-top',
labelAlign='center',
labelAnchor='middle'))).resolve_scale(x='independent').configure_view(strokeOpacity=0)
Final note
The total effect is additionally marked by horizontal black lines. It would be better to add these horizontal black lines to the legend, but from what I saw it is not straightfoward to do at this point.
Thanks:
Scripts used in this notebook are a combination of advice found (mostly) on stackoverlow, which I lost track of… So big thanks to all Altair experts out there!