Other articles


  1. Getting Started with PySpark

    Some Context

    I have been using the pandas library for almost 2 years now, but I have always been interested in getting started with using PySpark in a big data project. Since I intend to build a daily habit of taking notes of what I've learnt (which I haven't really …

    read more
  2. Conditional Colors in Plotly Tables

    Problem

    Generate a data table in Plotly that has the following features:

    1. Alternating cell and line colors for odd/even rows
    2. Unique cell color on first column
    3. For third column onwards, color cells using two different colors based on two levels of upper-bound/lower-bound conditions

    What I did

    Step 1 …

    read more
  3. Dataframe manipulation sequence - GroupBy Agg, Melt, Unstack

    Problem

    From a Pandas DataFrame, massage the DataFrame into a format where order Count and Total Amount could be determined for each Vendor and each Vendor-Buyer combination.

    :::python

    >> df = pd.DataFrame(data=
        {'Vendor': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C',
                'D', 'D', 'E', 'E', 'E', 'E', 'E'],      
        'Buyer …
    read more
  4. MultiIndex.to_frame()

    Problem

    From a MultiIndex dataframe, determine the total number of elements in the Buyer column for each Vendor.

    What I did

    Let's say we have the following DataFrame:

    :::python

    >> df = pd.DataFrame(data=
        {'Vendor': ['A', 'A', 'B', 'C', 'C', 'C',
                'D', 'D', 'E', 'E', 'F', 'G', 'G'],      
        'Buyer':['BU1', 'BU3 …
    read more
  5. MultiIndex.set_levels() in pandas

    Problem

    A user filed an issue on the pandas repo regarding MultiIndex.set_levels - and it turns out the user had some confusion between the set_levels method and the set_names method for MultiIndex due to the documentation. Hence, the MultiIndex.set_levels documentation was marked by the maintainers for improvements to clarify …

    read more

links

social