scitex_pd

scitex-pd — pandas helpers extracted from the SciTeX ecosystem.

Functionalities

force_df(x) — coerce dict / Series / list / scalar / ndarray into a pandas.DataFrame with sensible defaults.
from_xyz(df, x, y, z) / to_xy(df) / to_xyz(df) — long ↔ wide pivots; to_numeric(df) — column-wise numeric coercion.
find_pval(df) / _find_pval_col — locate p-value columns by name.
find_indi(df, mask) / get_unique(df, col) — boolean-mask and unique-values inspection helpers.
merge_columns / merge_cols, melt_cols, mv / mv_to_first / mv_to_last — column combine / reshape / reorder.
replace(df, mapping), round(df, ndigits), slice(df, …), sort(df, …) — uniform DataFrame-in / DataFrame-out transforms.
ignore_setting_with_copy_warning() — context-manager for the pandas SettingWithCopyWarning.

IO

Reads: pandas.DataFrame, pandas.Series, numpy.ndarray, dict, list, scalar inputs.
Writes: nothing — all functions return new pandas objects; original inputs are not mutated.

Dependencies

Hard: pandas, numpy, scitex-types (for is_listed_X).

Standalone import:

import scitex_pd as pd_
df = pd_.force_df(data)
pvals = pd_.find_pval(df)

The umbrella scitex.pd import path is preserved via a sys.modules-alias bridge in scitex-python.

scitex_pd.find_indi(df, conditions)[source]

Finds indices of rows that satisfy conditions, handling NaN values.

Example

>>> df = pd.DataFrame({'A': [1, 2, None], 'B': ['x', 'y', 'x']})
>>> conditions = {'A': [1, None], 'B': 'x'}
>>> result = find_indi(df, conditions)

Parameters:

df (pd.DataFrame) – Input DataFrame
conditions (Dict[str, Union[str, int, float, List]]) – Column conditions

Returns:

List of integer indices of matching rows

Return type:

List[int]

scitex_pd.find_pval(data, multiple=True)[source]

Find p-value column name(s) or key(s) in various data structures.

Return type:: Union[str, None, List[str]]

Example:

>>> df = pd.DataFrame({'p_value': [0.05, 0.01], 'pval': [0.1, 0.001], 'other': [1, 2]})
>>> find_pval(df)
['p_value', 'pval']
>>> find_pval(df, multiple=False)
'p_value'

Parameters:

dataUnion[pd.DataFrame, np.ndarray, List, Dict]: Data structure to search for p-value column or key
multiplebool, optional: If True, return all matches; if False, return only the first match (default is True)

Returns:

: Union[Optional[str], List[str]]

Name(s) of the column(s) or key(s) that match p-value patterns, or None if not found

scitex_pd.force_df(data, filler=nan)[source]

Convert various data types to pandas DataFrame.

Parameters:

data (various) – The data to convert to DataFrame. Can be DataFrame, Series, ndarray, list, tuple, dict, scalar value, etc.
filler (any, optional) – Value to use for filling missing values, by default np.nan

Returns:

Data converted to DataFrame

Return type:

pd.DataFrame

Examples

>>> import scitex
>>> import pandas as pd
>>> import numpy as np

# DataFrame input returns the same DataFrame >>> df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]}) >>> scitex.pd.force_df(df) is df True

# Series input is converted to DataFrame >>> series = pd.Series([1, 2, 3], name=’test’) >>> scitex.pd.force_df(series)

test

0 1 1 2 2 3

# NumPy array input is converted to DataFrame >>> arr = np.array([1, 2, 3]) >>> scitex.pd.force_df(arr)

value

0 1 1 2 2 3

# Scalar values are converted to single-value DataFrames >>> scitex.pd.force_df(42)

value

0 42

# Lists and tuples are converted to DataFrame >>> scitex.pd.force_df([1, 2, 3])

value

0 1 1 2 2 3

# Dictionaries are converted to DataFrame with appropriate handling # of different length values >>> data = {‘A’: [1, 2, 3], ‘B’: [4, 5]} >>> scitex.pd.force_df(data)

A B

0 1 4 1 2 5 2 3 NaN

scitex_pd.from_xyz(data_frame, x=None, y=None, z=None, square=False)[source]

Convert a DataFrame with ‘x’, ‘y’, ‘z’ format into a heatmap DataFrame.

Example

import pandas as pd data = pd.DataFrame({

‘col1’: [‘A’, ‘B’, ‘C’, ‘A’], ‘col2’: [‘X’, ‘Y’, ‘Z’, ‘Y’], ‘p_val’: [0.01, 0.05, 0.001, 0.1]

}) data = data.rename(columns={“col1”: “x”, “col2”: “y”, “p_val”: “z”}) result = from_xyz(data) print(result)

Parameters:

data_frame (pandas.DataFrame) – Input DataFrame with columns for x, y, and z values.
x (str, optional) – Name of the column to use as x-axis. Defaults to ‘x’.
y (str, optional) – Name of the column to use as y-axis. Defaults to ‘y’.
z (str, optional) – Name of the column to use as z-values. Defaults to ‘z’.
square (bool, optional) – If True, force the output to be a square matrix. Defaults to False.

Returns:

A DataFrame in heatmap/pivot format.

Return type:

pandas.DataFrame

scitex_pd.get_unique(df, column, default=None, raise_on_multiple=False)[source]

Get value from column if it contains a unique value.

Parameters:

df (DataFrame) – DataFrame to extract from
column (str) – Column name to check
default (Optional[Any]) – Default value if column doesn’t exist or has multiple unique values
raise_on_multiple (bool) – If True, raise ValueError when multiple unique values exist

Return type:

Any

Returns:

The unique value if exactly one exists, otherwise default value

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'patient_id': ['P01', 'P01', 'P01']})
>>> get_unique(df, 'patient_id')
'P01'

>>> df = pd.DataFrame({'patient_id': ['P01', 'P02']})
>>> get_unique(df, 'patient_id', default='Unknown')
'Unknown'

>>> # Raise error on multiple values
>>> get_unique(df, 'patient_id', raise_on_multiple=True)
ValueError: Column 'patient_id' has 2 unique values: ['P01', 'P02']

scitex_pd.ignore_SettingWithCopyWarning()

Context manager to temporarily ignore pandas SettingWithCopyWarning.

Example

>>> with ignore_SettingWithCopyWarning():
...     df['column'] = new_values  # No warning will be shown

scitex_pd.ignore_setting_with_copy_warning()[source]

Context manager to temporarily ignore pandas SettingWithCopyWarning.

Example

>>> with ignore_SettingWithCopyWarning():
...     df['column'] = new_values  # No warning will be shown

scitex_pd.melt_cols(df, cols, id_columns=None)[source]

Melt specified columns while preserving links to other data in a DataFrame.

Example

>>> data = pd.DataFrame({
...     'id': [1, 2, 3],
...     'name': ['Alice', 'Bob', 'Charlie'],
...     'score_1': [85, 90, 78],
...     'score_2': [92, 88, 95]
... })
>>> melted = melt_cols(data, cols=['score_1', 'score_2'])
>>> print(melted)
   id     name variable  value
0   1    Alice  score_1     85
1   2      Bob  score_1     90
2   3  Charlie  score_1     78
3   1    Alice  score_2     92
4   2      Bob  score_2     88
5   3  Charlie  score_2     95

Parameters:

df (pd.DataFrame) – Input DataFrame
cols (List[str]) – Columns to be melted
id_columns (Optional[List[str]], default None) – Columns to preserve as identifiers. If None, all columns not in ‘cols’ are used.

Returns:

Melted DataFrame with preserved identifier columns

Return type:

pd.DataFrame

Raises:

ValueError – If cols are not present in the DataFrame

scitex_pd.merge_cols(df, *args, sep=None, sep1='_', sep2='-', name='merged')

Creates a new column by joining specified columns.

Example

>>> df = pd.DataFrame({
...     'A': [0, 5, 10],
...     'B': [1, 6, 11],
...     'C': [2, 7, 12]
... })
>>> # Simple concatenation with separator
>>> merge_columns(df, 'A', 'B', sep=' ')
   A  B  C    A_B
0  0  1  2    0 1
1  5  6  7    5 6
2 10 11 12  10 11

>>> # With column labels
>>> merge_columns(df, 'A', 'B', sep1='_', sep2='-')
   A  B  C        A_B
0  0  1  2    A-0_B-1
1  5  6  7    A-5_B-6
2 10 11 12  A-10_B-11

Parameters:

df (pd.DataFrame) – Input DataFrame
*args (Union[str, List[str], Tuple[str, ...]]) – Column names to join
sep (str, optional) – Simple separator for values only (overrides sep1/sep2)
sep1 (str, optional) – Separator between column-value pairs, by default “_”
sep2 (str, optional) – Separator between column name and value, by default “-”
name (str, optional) – Name for the merged column, by default “merged”

Returns:

DataFrame with added merged column

Return type:

pd.DataFrame

scitex_pd.merge_columns(df, *args, sep=None, sep1='_', sep2='-', name='merged')[source]

Creates a new column by joining specified columns.

Example

>>> df = pd.DataFrame({
...     'A': [0, 5, 10],
...     'B': [1, 6, 11],
...     'C': [2, 7, 12]
... })
>>> # Simple concatenation with separator
>>> merge_columns(df, 'A', 'B', sep=' ')
   A  B  C    A_B
0  0  1  2    0 1
1  5  6  7    5 6
2 10 11 12  10 11

>>> # With column labels
>>> merge_columns(df, 'A', 'B', sep1='_', sep2='-')
   A  B  C        A_B
0  0  1  2    A-0_B-1
1  5  6  7    A-5_B-6
2 10 11 12  A-10_B-11

Parameters:

df (pd.DataFrame) – Input DataFrame
*args (Union[str, List[str], Tuple[str, ...]]) – Column names to join
sep (str, optional) – Simple separator for values only (overrides sep1/sep2)
sep1 (str, optional) – Separator between column-value pairs, by default “_”
sep2 (str, optional) – Separator between column name and value, by default “-”
name (str, optional) – Name for the merged column, by default “merged”

Returns:

DataFrame with added merged column

Return type:

pd.DataFrame

scitex_pd.mv(df, key, position, axis=1)[source]

Move a row or column to a specified position in a DataFrame.

Args: df (pandas.DataFrame): The input DataFrame. key (str): The label of the row or column to move. position (int): The position to move the row or column to. axis (int, optional): 0 for rows, 1 for columns. Defaults to 1.

Returns: pandas.DataFrame: A new DataFrame with the row or column moved.

scitex_pd.mv_to_first(df, key, axis=1)[source]

Move a row or column to the first position in a DataFrame.

Args: df (pandas.DataFrame): The input DataFrame. key (str): The label of the row or column to move. axis (int, optional): 0 for rows, 1 for columns. Defaults to 1.

Returns: pandas.DataFrame: A new DataFrame with the row or column moved to the first position.

scitex_pd.mv_to_last(df, key, axis=1)[source]

Move a row or column to the last position in a DataFrame.

Args: df (pandas.DataFrame): The input DataFrame. key (str): The label of the row or column to move. axis (int, optional): 0 for rows, 1 for columns. Defaults to 1.

Returns: pandas.DataFrame: A new DataFrame with the row or column moved to the last position.

scitex_pd.replace(dataframe, old_value, new_value=None, regex=False, cols=None)[source]

Replace values in a DataFrame.

Example

import pandas as pd df = pd.DataFrame({‘A’: [‘abc-123’, ‘def-456’], ‘B’: [‘ghi-789’, ‘jkl-012’]})

# Replace single value df_replaced = replace(df, ‘abc’, ‘xyz’)

# Replace with dictionary replace_dict = {‘-’: ‘_’, ‘1’: ‘one’} df_replaced = replace(df, replace_dict, cols=[‘A’]) print(df_replaced)

Parameters:

dataframe (pandas.DataFrame) – Input DataFrame to modify.
old_value (str, dict) – If str, the value to replace (requires new_value). If dict, mapping of old values (keys) to new values (values).
new_value (str, optional) – New value to replace old_value with. Required if old_value is str.
regex (bool, optional) – If True, treat replacement keys as regular expressions. Default is False.
cols (list of str, optional) – List of column names to apply replacements. If None, apply to all columns.

Returns:

DataFrame with specified replacements applied.

Return type:

pandas.DataFrame

scitex_pd.round(df, factor=3)[source]

Round numeric values in a DataFrame to a specified number of decimal places.

Example

>>> df = pd.DataFrame({'A': [1.23456, 2.34567], 'B': ['abc', 'def'], 'C': [3, 4]})
>>> round(df, 2)
      A    B  C
0  1.23  abc  3
1  2.35  def  4

Parameters:

df (pd.DataFrame) – Input DataFrame
factor (int, optional) – Number of decimal places to round to (default is 3)

Returns:

DataFrame with rounded numeric values

Return type:

pd.DataFrame

scitex_pd.slice(df, conditions=None, columns=None)[source]

Slices DataFrame rows and/or columns.

Example

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'x']})
>>> # Slice by row indices
>>> result = slice(df, slice(0, 2))
>>> # Slice by conditions
>>> result = slice(df, {'A': [1, 2], 'B': 'x'})
>>> # Slice columns
>>> result = slice(df, columns=['A'])

Parameters:

df (pd.DataFrame) – Input DataFrame to slice
conditions (slice, Dict, or None) – Either a slice object for row indices, or a dictionary of column conditions
columns (List[str], optional) – List of column names to select

Returns:

Sliced DataFrame

Return type:

pd.DataFrame

scitex_pd.sort(dataframe, by=None, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None, orders=None)[source]

Sort DataFrame by specified column(s) with optional custom ordering and column reordering.

Example

import pandas as pd df = pd.DataFrame({‘A’: [‘foo’, ‘bar’, ‘baz’], ‘B’: [3, 2, 1]}) custom_order = {‘A’: [‘bar’, ‘baz’, ‘foo’]} sorted_df = sort(df, by=None, orders=custom_order) print(sorted_df)

Parameters:

dataframe (pandas.DataFrame) – The DataFrame to sort.
by (str or list of str, optional) – Name(s) of column(s) to sort by.
ascending (bool or list of bool, default True) – Sort ascending vs. descending.
inplace (bool, default False) – If True, perform operation in-place.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm.
na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if ‘first’; ‘last’ puts NaNs at the end.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – Apply the key function to the values before sorting.
orders (dict, optional) – Dictionary of column names and their custom sort orders.

Returns:

Sorted DataFrame with reordered columns.

Return type:

pandas.DataFrame

scitex_pd.to_numeric(df, errors='coerce')[source]

Convert all possible columns in a DataFrame to numeric types.

Parameters:

df (pd.DataFrame) – Input DataFrame
errors (str, optional) – How to handle errors. ‘coerce’ (default) converts invalid values to NaN, ‘ignore’ leaves non-numeric columns unchanged, ‘raise’ raises exceptions.

Returns:

DataFrame with numeric columns converted

Return type:

pd.DataFrame

scitex_pd.to_xy(data_frame)[source]

Convert a heatmap DataFrame into x, y, z format.

Ensure the index and columns are the same, and if either exists, replace with that.

Example

data_frame = pd.DataFrame(…) # Your DataFrame here out = to_xy(data_frame) print(out)

Parameters:: data_frame (pandas.DataFrame) – The input DataFrame to be converted.
Returns:: A DataFrame formatted with columns [‘x’, ‘y’, ‘z’]
Return type:: pandas.DataFrame

scitex_pd.to_xyz(data_frame)[source]

Convert a DataFrame into x, y, z format (long format).

Transforms a DataFrame from wide format (matrix/heatmap) to long format where each value becomes a row with x (row index), y (column name), and z (value) columns.

Example

data_frame = pd.DataFrame(…) # Your DataFrame here out = to_xyz(data_frame) print(out)

Parameters:: data_frame (pandas.DataFrame) – The input DataFrame to be converted.
Returns:: A DataFrame formatted with columns [‘x’, ‘y’, ‘z’]
Return type:: pandas.DataFrame