scitex_pd
scitex-pd — pandas helpers extracted from the SciTeX ecosystem.
Functionalities
force_df(x) — coerce dict / Series / list / scalar / ndarray into a pandas.DataFrame with sensible defaults.
from_xyz(df, x, y, z) / to_xy(df) / to_xyz(df) — long ↔ wide pivots; to_numeric(df) — column-wise numeric coercion.
find_pval(df) / _find_pval_col — locate p-value columns by name.
find_indi(df, mask) / get_unique(df, col) — boolean-mask and unique-values inspection helpers.
merge_columns / merge_cols, melt_cols, mv / mv_to_first / mv_to_last — column combine / reshape / reorder.
replace(df, mapping), round(df, ndigits), slice(df, …), sort(df, …) — uniform DataFrame-in / DataFrame-out transforms.
ignore_setting_with_copy_warning() — context-manager for the pandas SettingWithCopyWarning.
IO
Reads: pandas.DataFrame, pandas.Series, numpy.ndarray, dict, list, scalar inputs.
Writes: nothing — all functions return new pandas objects; original inputs are not mutated.
Dependencies
Hard: pandas, numpy, scitex-types (for is_listed_X).
Standalone import:
import scitex_pd as pd_
df = pd_.force_df(data)
pvals = pd_.find_pval(df)
The umbrella scitex.pd import path is preserved via a sys.modules-alias bridge in scitex-python.
- scitex_pd.find_indi(df, conditions)[source]
Finds indices of rows that satisfy conditions, handling NaN values.
Example
>>> df = pd.DataFrame({'A': [1, 2, None], 'B': ['x', 'y', 'x']}) >>> conditions = {'A': [1, None], 'B': 'x'} >>> result = find_indi(df, conditions)
- scitex_pd.find_pval(data, multiple=True)[source]
Find p-value column name(s) or key(s) in various data structures.
Example:
>>> df = pd.DataFrame({'p_value': [0.05, 0.01], 'pval': [0.1, 0.001], 'other': [1, 2]}) >>> find_pval(df) ['p_value', 'pval'] >>> find_pval(df, multiple=False) 'p_value'
Parameters:
- dataUnion[pd.DataFrame, np.ndarray, List, Dict]
Data structure to search for p-value column or key
- multiplebool, optional
If True, return all matches; if False, return only the first match (default is True)
Returns:
: Union[Optional[str], List[str]]
Name(s) of the column(s) or key(s) that match p-value patterns, or None if not found
- scitex_pd.force_df(data, filler=nan)[source]
Convert various data types to pandas DataFrame.
- Parameters:
data (various) – The data to convert to DataFrame. Can be DataFrame, Series, ndarray, list, tuple, dict, scalar value, etc.
filler (any, optional) – Value to use for filling missing values, by default np.nan
- Returns:
Data converted to DataFrame
- Return type:
pd.DataFrame
Examples
>>> import scitex >>> import pandas as pd >>> import numpy as np
# DataFrame input returns the same DataFrame >>> df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]}) >>> scitex.pd.force_df(df) is df True
# Series input is converted to DataFrame >>> series = pd.Series([1, 2, 3], name=’test’) >>> scitex.pd.force_df(series)
test
0 1 1 2 2 3
# NumPy array input is converted to DataFrame >>> arr = np.array([1, 2, 3]) >>> scitex.pd.force_df(arr)
value
0 1 1 2 2 3
# Scalar values are converted to single-value DataFrames >>> scitex.pd.force_df(42)
value
0 42
# Lists and tuples are converted to DataFrame >>> scitex.pd.force_df([1, 2, 3])
value
0 1 1 2 2 3
# Dictionaries are converted to DataFrame with appropriate handling # of different length values >>> data = {‘A’: [1, 2, 3], ‘B’: [4, 5]} >>> scitex.pd.force_df(data)
A B
0 1 4 1 2 5 2 3 NaN
- scitex_pd.from_xyz(data_frame, x=None, y=None, z=None, square=False)[source]
Convert a DataFrame with ‘x’, ‘y’, ‘z’ format into a heatmap DataFrame.
Example
import pandas as pd data = pd.DataFrame({
‘col1’: [‘A’, ‘B’, ‘C’, ‘A’], ‘col2’: [‘X’, ‘Y’, ‘Z’, ‘Y’], ‘p_val’: [0.01, 0.05, 0.001, 0.1]
}) data = data.rename(columns={“col1”: “x”, “col2”: “y”, “p_val”: “z”}) result = from_xyz(data) print(result)
- Parameters:
data_frame (pandas.DataFrame) – Input DataFrame with columns for x, y, and z values.
x (str, optional) – Name of the column to use as x-axis. Defaults to ‘x’.
y (str, optional) – Name of the column to use as y-axis. Defaults to ‘y’.
z (str, optional) – Name of the column to use as z-values. Defaults to ‘z’.
square (bool, optional) – If True, force the output to be a square matrix. Defaults to False.
- Returns:
A DataFrame in heatmap/pivot format.
- Return type:
pandas.DataFrame
- scitex_pd.get_unique(df, column, default=None, raise_on_multiple=False)[source]
Get value from column if it contains a unique value.
- Parameters:
- Return type:
- Returns:
The unique value if exactly one exists, otherwise default value
Examples
>>> import pandas as pd >>> df = pd.DataFrame({'patient_id': ['P01', 'P01', 'P01']}) >>> get_unique(df, 'patient_id') 'P01'
>>> df = pd.DataFrame({'patient_id': ['P01', 'P02']}) >>> get_unique(df, 'patient_id', default='Unknown') 'Unknown'
>>> # Raise error on multiple values >>> get_unique(df, 'patient_id', raise_on_multiple=True) ValueError: Column 'patient_id' has 2 unique values: ['P01', 'P02']
- scitex_pd.ignore_SettingWithCopyWarning()
Context manager to temporarily ignore pandas SettingWithCopyWarning.
Example
>>> with ignore_SettingWithCopyWarning(): ... df['column'] = new_values # No warning will be shown
- scitex_pd.ignore_setting_with_copy_warning()[source]
Context manager to temporarily ignore pandas SettingWithCopyWarning.
Example
>>> with ignore_SettingWithCopyWarning(): ... df['column'] = new_values # No warning will be shown
- scitex_pd.melt_cols(df, cols, id_columns=None)[source]
Melt specified columns while preserving links to other data in a DataFrame.
Example
>>> data = pd.DataFrame({ ... 'id': [1, 2, 3], ... 'name': ['Alice', 'Bob', 'Charlie'], ... 'score_1': [85, 90, 78], ... 'score_2': [92, 88, 95] ... }) >>> melted = melt_cols(data, cols=['score_1', 'score_2']) >>> print(melted) id name variable value 0 1 Alice score_1 85 1 2 Bob score_1 90 2 3 Charlie score_1 78 3 1 Alice score_2 92 4 2 Bob score_2 88 5 3 Charlie score_2 95
- Parameters:
- Returns:
Melted DataFrame with preserved identifier columns
- Return type:
pd.DataFrame
- Raises:
ValueError – If cols are not present in the DataFrame
- scitex_pd.merge_cols(df, *args, sep=None, sep1='_', sep2='-', name='merged')
Creates a new column by joining specified columns.
Example
>>> df = pd.DataFrame({ ... 'A': [0, 5, 10], ... 'B': [1, 6, 11], ... 'C': [2, 7, 12] ... }) >>> # Simple concatenation with separator >>> merge_columns(df, 'A', 'B', sep=' ') A B C A_B 0 0 1 2 0 1 1 5 6 7 5 6 2 10 11 12 10 11
>>> # With column labels >>> merge_columns(df, 'A', 'B', sep1='_', sep2='-') A B C A_B 0 0 1 2 A-0_B-1 1 5 6 7 A-5_B-6 2 10 11 12 A-10_B-11
- Parameters:
df (pd.DataFrame) – Input DataFrame
*args (Union[str, List[str], Tuple[str, ...]]) – Column names to join
sep (str, optional) – Simple separator for values only (overrides sep1/sep2)
sep1 (str, optional) – Separator between column-value pairs, by default “_”
sep2 (str, optional) – Separator between column name and value, by default “-”
name (str, optional) – Name for the merged column, by default “merged”
- Returns:
DataFrame with added merged column
- Return type:
pd.DataFrame
- scitex_pd.merge_columns(df, *args, sep=None, sep1='_', sep2='-', name='merged')[source]
Creates a new column by joining specified columns.
Example
>>> df = pd.DataFrame({ ... 'A': [0, 5, 10], ... 'B': [1, 6, 11], ... 'C': [2, 7, 12] ... }) >>> # Simple concatenation with separator >>> merge_columns(df, 'A', 'B', sep=' ') A B C A_B 0 0 1 2 0 1 1 5 6 7 5 6 2 10 11 12 10 11
>>> # With column labels >>> merge_columns(df, 'A', 'B', sep1='_', sep2='-') A B C A_B 0 0 1 2 A-0_B-1 1 5 6 7 A-5_B-6 2 10 11 12 A-10_B-11
- Parameters:
df (pd.DataFrame) – Input DataFrame
*args (Union[str, List[str], Tuple[str, ...]]) – Column names to join
sep (str, optional) – Simple separator for values only (overrides sep1/sep2)
sep1 (str, optional) – Separator between column-value pairs, by default “_”
sep2 (str, optional) – Separator between column name and value, by default “-”
name (str, optional) – Name for the merged column, by default “merged”
- Returns:
DataFrame with added merged column
- Return type:
pd.DataFrame
- scitex_pd.mv(df, key, position, axis=1)[source]
Move a row or column to a specified position in a DataFrame.
Args: df (pandas.DataFrame): The input DataFrame. key (str): The label of the row or column to move. position (int): The position to move the row or column to. axis (int, optional): 0 for rows, 1 for columns. Defaults to 1.
Returns: pandas.DataFrame: A new DataFrame with the row or column moved.
- scitex_pd.mv_to_first(df, key, axis=1)[source]
Move a row or column to the first position in a DataFrame.
Args: df (pandas.DataFrame): The input DataFrame. key (str): The label of the row or column to move. axis (int, optional): 0 for rows, 1 for columns. Defaults to 1.
Returns: pandas.DataFrame: A new DataFrame with the row or column moved to the first position.
- scitex_pd.mv_to_last(df, key, axis=1)[source]
Move a row or column to the last position in a DataFrame.
Args: df (pandas.DataFrame): The input DataFrame. key (str): The label of the row or column to move. axis (int, optional): 0 for rows, 1 for columns. Defaults to 1.
Returns: pandas.DataFrame: A new DataFrame with the row or column moved to the last position.
- scitex_pd.replace(dataframe, old_value, new_value=None, regex=False, cols=None)[source]
Replace values in a DataFrame.
Example
import pandas as pd df = pd.DataFrame({‘A’: [‘abc-123’, ‘def-456’], ‘B’: [‘ghi-789’, ‘jkl-012’]})
# Replace single value df_replaced = replace(df, ‘abc’, ‘xyz’)
# Replace with dictionary replace_dict = {‘-’: ‘_’, ‘1’: ‘one’} df_replaced = replace(df, replace_dict, cols=[‘A’]) print(df_replaced)
- Parameters:
dataframe (pandas.DataFrame) – Input DataFrame to modify.
old_value (str, dict) – If str, the value to replace (requires new_value). If dict, mapping of old values (keys) to new values (values).
new_value (str, optional) – New value to replace old_value with. Required if old_value is str.
regex (bool, optional) – If True, treat replacement keys as regular expressions. Default is False.
cols (list of str, optional) – List of column names to apply replacements. If None, apply to all columns.
- Returns:
DataFrame with specified replacements applied.
- Return type:
pandas.DataFrame
- scitex_pd.round(df, factor=3)[source]
Round numeric values in a DataFrame to a specified number of decimal places.
Example
>>> df = pd.DataFrame({'A': [1.23456, 2.34567], 'B': ['abc', 'def'], 'C': [3, 4]}) >>> round(df, 2) A B C 0 1.23 abc 3 1 2.35 def 4
- Parameters:
df (pd.DataFrame) – Input DataFrame
factor (int, optional) – Number of decimal places to round to (default is 3)
- Returns:
DataFrame with rounded numeric values
- Return type:
pd.DataFrame
- scitex_pd.slice(df, conditions=None, columns=None)[source]
Slices DataFrame rows and/or columns.
Example
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'x']}) >>> # Slice by row indices >>> result = slice(df, slice(0, 2)) >>> # Slice by conditions >>> result = slice(df, {'A': [1, 2], 'B': 'x'}) >>> # Slice columns >>> result = slice(df, columns=['A'])
- scitex_pd.sort(dataframe, by=None, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None, orders=None)[source]
Sort DataFrame by specified column(s) with optional custom ordering and column reordering.
Example
import pandas as pd df = pd.DataFrame({‘A’: [‘foo’, ‘bar’, ‘baz’], ‘B’: [3, 2, 1]}) custom_order = {‘A’: [‘bar’, ‘baz’, ‘foo’]} sorted_df = sort(df, by=None, orders=custom_order) print(sorted_df)
- Parameters:
dataframe (pandas.DataFrame) – The DataFrame to sort.
by (str or list of str, optional) – Name(s) of column(s) to sort by.
ascending (bool or list of bool, default True) – Sort ascending vs. descending.
inplace (bool, default False) – If True, perform operation in-place.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm.
na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if ‘first’; ‘last’ puts NaNs at the end.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – Apply the key function to the values before sorting.
orders (dict, optional) – Dictionary of column names and their custom sort orders.
- Returns:
Sorted DataFrame with reordered columns.
- Return type:
pandas.DataFrame
- scitex_pd.to_numeric(df, errors='coerce')[source]
Convert all possible columns in a DataFrame to numeric types.
- Parameters:
df (pd.DataFrame) – Input DataFrame
errors (str, optional) – How to handle errors. ‘coerce’ (default) converts invalid values to NaN, ‘ignore’ leaves non-numeric columns unchanged, ‘raise’ raises exceptions.
- Returns:
DataFrame with numeric columns converted
- Return type:
pd.DataFrame
- scitex_pd.to_xy(data_frame)[source]
Convert a heatmap DataFrame into x, y, z format.
Ensure the index and columns are the same, and if either exists, replace with that.
Example
data_frame = pd.DataFrame(…) # Your DataFrame here out = to_xy(data_frame) print(out)
- Parameters:
data_frame (pandas.DataFrame) – The input DataFrame to be converted.
- Returns:
A DataFrame formatted with columns [‘x’, ‘y’, ‘z’]
- Return type:
pandas.DataFrame
- scitex_pd.to_xyz(data_frame)[source]
Convert a DataFrame into x, y, z format (long format).
Transforms a DataFrame from wide format (matrix/heatmap) to long format where each value becomes a row with x (row index), y (column name), and z (value) columns.
Example
data_frame = pd.DataFrame(…) # Your DataFrame here out = to_xyz(data_frame) print(out)
- Parameters:
data_frame (pandas.DataFrame) – The input DataFrame to be converted.
- Returns:
A DataFrame formatted with columns [‘x’, ‘y’, ‘z’]
- Return type:
pandas.DataFrame