Replace Bio.pairwise2 with Bio.Align.PairwiseAligner

Deprecation warning:

/lib/python3.10/site-packages/Bio/ BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython.
As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.

Reverse R1 mode

It'd be nice to have this separate, for projects where you actually don't need to reverse R1 to match barcodes later on. Can go as an extra var in the modes

error when using mesh_type 'circle'

Using a run_mode with mesh_type: circle always throws the following error:

Error in rule create_mesh_spatial_dge:                                                                                                                                                     
    jobid: 0                                                                                                                                                                               
    output: projects/slide_seq/processed_data/mouse_brain_v2_meshed/illumina/complete_data/dge/dge.all.polyA_adapter_trimmed.mm_included.spatial_beads.mesh_55_100_Puck_190926_03_bead_loca
tions.h5ad, projects/slide_seq/processed_data/mouse_brain_v2_meshed/illumina/complete_data/dge/dge.all.polyA_adapter_trimmed.mm_included.spatial_beads.mesh_55_100_Puck_190926_03_bead_loca
tions.obs.csv                                                                                                                                                                                                                                                                                                                                                                         RuleException:                                                                                                                                                                             
IndexError in line 462 of /data/local/rajewsky/home/lstreng/spacemake_dir_test/spacemake/spacemake/snakemake/main.smk:
index 31353 is out of bounds for axis 0 with size 8896                                                                                                                                       File "/data/rajewsky/home/lstreng/miniconda_new/envs/spacemake_new/lib/python3.10/site-packages/snakemake/executors/", line 2330, in run_wrapper                              
  File "/data/local/rajewsky/home/lstreng/spacemake_dir_test/spacemake/spacemake/snakemake/main.smk", line 462, in __rule_create_mesh_spatial_dge
  File "/data/local/rajewsky/home/lstreng/spacemake_dir_test/spacemake/spacemake/spatial/", line 529, in create_meshed_adata                             
  File "/data/local/rajewsky/home/lstreng/spacemake_dir_test/spacemake/spacemake/spatial/", line 339, in aggregate_adata_by_indices                                                   File "/data/rajewsky/home/lstreng/miniconda_new/envs/spacemake_new/lib/python3.10/site-packages/snakemake/executors/", line 569, in _callback      
  File "/data/rajewsky/home/lstreng/miniconda_new/envs/spacemake_new/lib/python3.10/concurrent/futures/", line 58, in run                                                         
  File "/data/rajewsky/home/lstreng/miniconda_new/envs/spacemake_new/lib/python3.10/site-packages/snakemake/executors/", line 555, in cached_or_run                      
  File "/data/rajewsky/home/lstreng/miniconda_new/envs/spacemake_new/lib/python3.10/site-packages/snakemake/executors/", line 2362, in run_wrapper                       
Exiting because a job execution failed. Look above for error message 

It happens for all kinds of datasets I have tried (slideseq, visium, stereoseq, ...). It's just the meshing that fails (and things that are dependent on it), everything else works as it should.

Using mesh_type: hexagon on the same data works.

Version is 0.7.1

Potential bug: some log files have 0 size


4937 Dec  5 11:24 cutadapt.log
 991 Dec  5 11:08 fastq_to_uBAM.log
   0 Dec  5 13:53 genome.STAR.bamstats.log
   0 Dec  5 13:53 not_genome.STAR.bamstats.log
   0 Dec  5 14:13 not_rRNA.bowtie2.bamstats.log
   0 Dec  6 09:23 overview_plot.log
   0 Dec  5 13:55 rRNA.bowtie2.bamstats.log
 221 Dec  5 11:48 rRNA.bowtie2.log
   0 Dec  5 13:53 unaligned_bc_tagged.bamstats.log

Update_sample requires species parameter

After some commit the update_sample command now requires the --species parameter to run, otherwise throws an error, even if the species is already defined for the sample.

Errors when adding barcode_flavor & puck (v0.7.2)

Adding a barcode_flavor or a puck through the command line throws errors:

spacemake config add_barcode_flavor --name NAME --umi r1[10:15] --cell_barcode r1[1:9]


Traceback (most recent call last):
  File "/miniconda3/envs/spacemake/bin/spacemake", line 8, in <module>
  File "/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/", line 613, in cmdline
  File "/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/", line 261, in <lambda>
    func = lambda args: add_update_delete_variable_cmdline(config, args)
  File "/miniconda3/envs/spacemake/lib/python3.10/", line 79, in inner
    return func(*args, **kwds)
  File "/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/", line 331, in add_update_delete_variable_cmdline
    var_variables = func(variable, name, **args)
  File "/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/", line 757, in add_variable
    values = self.process_variable_args(variable, **kwargs)
  File "/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/", line 722, in process_variable_args
    return self.process_barcode_flavor_args(**kwargs)
TypeError: ConfigFile.process_barcode_flavor_args() got an unexpected keyword argument 'name'

And similarly for the puck.

spacemake init --download_species return KeyError

This is the conda list results:

Reduce number of threads in fastq -> BAM

Rule in the drop-dropseq branch gives up to 32 threads, but most are not used because of I/O being the bottleneck.

Benchmark I/O vs # of different samples and select a better value.

spacemake projects merge_samples

It seems that it currently starts from merging .fastq files. Is it not better to start much downstream of that, from the bam files?

loading the STAR mapping index takes long. Treat this as a service

snakemake support 'services'

which can be started and stopped. Sounds like a perfect fit for loading the STAR genome index and releasing it again after the last instance has quit. In the meantime, STAR mapping jobs do not have to load their own copy of the genome index which would make them faster to start mappting and reduce overall memory footprint

Fixing the header problem when mapping

STAR omits some info from the header and some Picard tools complain about it. Current workaround uses a python script which takes quite some time to rewrite the bam file.

A quicker solution that fixes the problem is to explicitly add the header values when mapping with star as:

--outSAMattrRGline ID:A SM:${name}

${name} needs to be the same with the filename produced upstream. Not sure how universal ID:A is, please check for consistency.

Fix issue when deleting sample

Tryting spacemake projects delete_sample --project_id project --sample_id sample throws a KeyError: KeyError: 'project_id_list'

config set_run_mode bug

Throws an error of the type

msg += f'SUCCESS: run mode: {run_mode} {action_msg} succesfully.\n'
NameError: name 'run_mode' is not defined

restore cmd line functionality for adding barcode_flavor

Functionality is currently broken on the master branch.

spacemake config add_barcode_flavor
usage: spacemake config add_barcode_flavor [-h] --name NAME --umi UMI --cell_barcode CELL_BARCODE
spacemake config add_barcode_flavor: error: the following arguments are required: --name, --umi, --cell_barcode

Giving name, umi and cell_barcode does not work:

spacemake config add_barcode_flavor --name fc_sts_miniseq --umi r2[0:9] --cell_barcode r1[2:27]
add_variable() called with variable=barcode_flavors name=fc_sts_miniseq kw={'umi': 'r2[0:9]', 'cell_barcode': 'r1[2:27]', 'name': 'fc_sts_miniseq'}
Traceback (most recent call last):
  File "../miniconda3/envs/smk/bin/spacemake", line 8, in <module>
  File "../miniconda3/envs/smk/lib/python3.10/site-packages/spacemake/", line 604, in cmdline
  File "../miniconda3/envs/smk/lib/python3.10/site-packages/spacemake/", line 261, in <lambda>
    func = lambda args: add_update_delete_variable_cmdline(config, args)
  File "../miniconda3/envs/smk/lib/python3.10/", line 79, in inner
    return func(*args, **kwds)
  File "../miniconda3/envs/smk/lib/python3.10/site-packages/spacemake/", line 331, in add_update_delete_variable_cmdline
    var_variables = func(variable, name, **args)
  File "../miniconda3/envs/smk/lib/python3.10/site-packages/spacemake/", line 745, in add_variable
    values = self.process_variable_args(variable, **kwargs)
  File "../miniconda3/envs/smk/lib/python3.10/site-packages/spacemake/", line 710, in process_variable_args
    return self.process_barcode_flavor_args(**kwargs)
TypeError: ConfigFile.process_barcode_flavor_args() got an unexpected keyword argument 'name'

Error "version" with spacemake config add_species

Hi ! After downloading human data, I try to launch spacemake config add_species with the following command :
spacemake config add_species --name human --genome PATH/species_data/human/GRCh38.primary_assembly.genome.fa.gz --annotation PATH/species_data/human/gencode.v38.primary_assembly.annotation.gtf.gz
but I have the error :

Traceback (most recent call last):
File "/home/user/anaconda3/envs/spacemake/bin/spacemake", line 8, in
File "/home/user/anaconda3/envs/spacemake/lib/python3.9/site-packages/spacemake/", line 611, in cmdline
File "/home/user/anaconda3/envs/spacemake/lib/python3.9/site-packages/spacemake/", line 234, in
func = lambda args: add_update_delete_variable_cmdline(config, args)
File "/home/user/anaconda3/envs/spacemake/lib/python3.9/", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/spacemake/lib/python3.9/site-packages/spacemake/", line 305, in add_update_delete_variable_cmdline
var_variables = func(variable, name, **args)
File "/home/user/anaconda3/envs/spacemake/lib/python3.9/site-packages/spacemake/", line 661, in add_variable
values = self.process_variable_args(variable, **kwargs)
File "/home/user/anaconda3/envs/spacemake/lib/python3.9/site-packages/spacemake/", line 655, in process_variable_args
return self.process_species_args(**kwargs)
TypeError: process_species_args() got an unexpected keyword argument 'version'

Do you know this problem, and if yes, can you please help me to overcome it ? Thanks in advance for your answer !

Error when adding the --download_species during init

Starting with a fresh installation (v0.7.2), adding the --download_species during initialization

spacemake init --download_species human --dropseq_tools /Drop-seq_tools-2.5.1/

throws an error:

usage: spacemake [-h] [--version] {init} ...
spacemake: error: unrecognized arguments: human

config add_run_mode

  • Shall we not change the spacemake config add_run_mode to have Boolean vars instead of e.g. clean_dge and no-clean_dge? If you do config list_modes they anyway appear Boolean.
  • Creating a new mode with --plot_bead_size 1 creates a mode which after listing it shows plot_bead_size: '1'. Hints that it saves it as string and not as int or float

add/update_sample needs to validate --map_strategy

Currently, no validation of the map_strategy string is performed when adding or updating a sample. If a reference is used that does not exist in the config.yaml, this triggers a confusing KeyError in mapping.smk during spacemake run.

Expected behavior: An exception should be thrown when adding or updating a sample with an invalid map_strategy.

Missing input files for rule symlinks

Hi, I'm trying to launch run of spacemake
I've installed everything and bump into two errors:

first. spacemake init --dropseq_tools /home/eugenea/bin/Drop-seq_tools-2.5.1 --download_species human --download_species
finishes snakemake part, but seems to fails later

Second. (after re-init with the downloaded genomes spacemake config add_species --name homo_sapiens --reference genome --sequence /home/eugenea/spacemake_test/species_data/human/human_genome.fa --annotation /home/eugenea/spacemake_test/species_data/human/human_annotation.gtf ) Spacemake fails to build dag

project was added like in the docs:

spacemake projects add_sample \
   --project_id spacemake_test_run \
   --sample_id visq_test \
   --R1 /home/data/fastq/100086/scRNAseq/vis1_S1_L001_R1_001.fastq.gz /home/data/fastq/100086/scRNAseq/vis1_S1_L002_R1_001.fastq.gz \
   --R2 /home/data/fastq/100086/scRNAseq/vis1_S1_L001_R2_001.fastq.gz /home/data/fastq/100086/scRNAseq/vis1_S1_L002_R2_001.fastq.gz \
   --species homo_sapiens \
   --puck visium \
   --run_mode visium \
   --barcode_flavor visium

Best, Eugene

QC_sheet rRNA % is wrong

It's currently computed from the STAR input reads and not from the total reads as it should.

(Eventually migrate to python qc sheets anyway)

spatial barcodes are not processed during downsampling

When running spacemake run downsample on a dataset with spatial pucks, the output at the different downsampling levels only generates the dge files for the _beads_no_spatial_data, and for none of the spatial pucks specified for the sample.

This issue was observed in 0.5.5 and 0.7.2 (potentially on other versions, too)

Minor suggestions v2

  • Consider keeping only true and false (i.e. remove capital letters) to clean the clutter
  • Consider offering a seqscope and an scRNAseq run mode with sensible defaults
  • In the barcode_flavors we need doumentation / example for the add_barcode_flavor. The r1[x:y] structure is not clear to a new user. Also need to explain the numbering, i.e. that 0 is the first base and the second number is not inclusive
  • i still think spacemake projects list should also list barcode flavor and run modes
  • offer the user the option to use an existing genome index
  • where exactly do we define the rRNA index? Starting from scratch I don't see it anywhere and hence usually skip it

Unify flags in cmd-line

Some of the commands require flags, others don't. Example:

spacemake projects set_barcode_flavor --project_id etc


spacemake config add_barcode_flavor --name name

doesn't work currently

List of joined spatial units in meshed adata

In the function

def create_meshed_adata(adata,
spot_diameter_um = 55,
spot_distance_um = 100,
bead_diameter_um = 10,
mesh_type = 'circle',

The "meshed" adata_out object constitutes the main output, e.g.,

adata_out = anndata.AnnData(csc_matrix(joined_C_sumed),
obs = pd.DataFrame({'x_pos': joined_coordinates[:, 0],
'y_pos': joined_coordinates[:, 1]}),
var = adata.var)

For some analyses, it is relevant to have a list of spatial units that have been merged into each newly created mesh cell. That is, for each new cell in adata_out, a list of spots. This can be done by index (pointing to the .obs_names in the adata input), such that it can be stored as a sparse matrix where 1 denotes mapping.

I propose both of these lists can be stored as adata_out.uns["indices_joined_spatial_units"] and adata_out.uns["spatial_units_obs_names"]. The purpose of this second list is to keep track of the adata.obs_names, which could contain relevant information such as cell barcodes that can be traced back to the final bam file.

Edit 1:
It is important to modularize spacemake.spatial.util.create_meshed_adata by separating the mapping of spatial units to the mesh and the construction of the adata. This is helpful when integrating other types of spatial unit aggregation (e.g., supported by segmentation).

MissingInputException when merging with 0.7.2

When I try to merge two samples with spacemake 0.7.2, I first run:

spacemake projects merge_samples \
--merged_project_id <project> --merged_sample_id <sample_merged> \
--project_id_list <project> \
--sample_id_list <sample_a> <sample_b> 

(the IDs were replaced by )

Then, when running spacemake run, it exits upon the following exception:

MissingInputException: Missing input files for rule symlinks:
    MissingInputException: Missing input files for rule link_raw_reads:
    MissingInputException: Missing input files for rule demultiplex_data:

Add support for updating R1/R2 in update_sample function

There's currently no functionality for that:

$ spacemake projects update_sample
usage: spacemake projects update_sample [-h] --project_id PROJECT_ID --sample_id SAMPLE_ID [--barcode_flavor BARCODE_FLAVOR] [--adapter_flavor ADAPTER_FLAVOR] [--species SPECIES]
                                        [--map_strategy MAP_STRATEGY] [--puck PUCK] [--puck_barcode_file_id PUCK_BARCODE_FILE_ID [PUCK_BARCODE_FILE_ID ...]]
                                        [--puck_barcode_file PUCK_BARCODE_FILE [PUCK_BARCODE_FILE ...]] [--run_mode RUN_MODE [RUN_MODE ...]] [--investigator INVESTIGATOR] [--experiment EXPERIMENT]
                                        [--sequencing_date SEQUENCING_DATE]

project_id should encompass sample_id

Attempting to change the species from mouse to human for all samples within a project doesn't work.
spacemake projects update_sample --project_id SP115 --species human gives an error

spacemake projects update_sample: error: the following arguments are required: --sample_id

Since project_id contains all sample_ids, do we want to go ahead with the change without specifying sample_ids?

Minor improvements / additions / corrections

Several thoughts and minor stuff to consider:

  • Can we give the user the option NOT to index but to import an indexed genome?
  • spacemake config add_run_mode currently returns an error instead of listing the variables.
  • Same with spacemake config update_run_mode
  • spacemake projects set_run_mode keeps the default inside. Why do we need that? Maybe if the user sets the mode remove the default? removing it with remove_run_mode works fine though.
  • In the statements (average 106608.958 reads/second). Just round and remove the decimals to reduce clutter. Same where we report time.
  • During mapping there's an error [E::idx_find_and_load] Could not retrieve index file for '/dev/stdin'. Is this any problem?
  • Shall we not have STAR verbose and see the mapping progress? Right now we only report the gene tagging.
  • spacemake projects merge_samples doesn't inherit barcode_flavor (spacemake v.0.0.2)
  • samtools merge has a @ flag for multi-threading. We should totally use that.
  • When merging starts, print a line saying something like merging sample1 and sample2
  • Sambamba throws an error sambamba: error while loading shared libraries: cannot open shared object file: No such file or directory but doesn't seem to affect.

Cannot run spacemake after updating run_mode

After deleting the projects directory and even the species, spacemake refuses to run. (projects list and config run fine giving the expected output). Below the error

$ spacemake run --cores 30
Building DAG of jobs...
WorkflowError in line 64 of .../lib/python3.9/site-packages/spacemake/snakemake/dropseq.smk:
Function did not return str or list of str.

Fix add_barcode_flavor issues

  • if sample_id is missing, by default it adds the flavor to all samples in the project. The message should list the sample_ids that have changed, currently it says and for samples: ['']
  • if random names are inputted, it doesn't throw an error but instead returns a normal message like Setting barcode flavor: scrnaseq for projects: ['KO'] and for samples: ['this_shouldnt_run']
    SUCCESS: barcode_flavor set successfully.

Support for FFPE?

Hi, Thank you for making this user-friendly package. I am wondering if there is support for Visium FFPE data coming? Thanks.

