|
if fast_and_unsafe: |
|
_LOGGER.warning("Using a concatenation method that is not 100% safe.") |
|
matrix_raw = pd.concat(matrix_raw, axis=1, sort=False) |
|
matrix_raw = matrix_raw.loc[:, ~matrix_raw.columns.duplicated()].set_index(["chrom", "start", "end"]) |
For consensus coverage bed for each sample, the index order of matrix_raw is not in natural sort order (the sample-specific coverage bed under distributed does not have regions in natural sort order too).
See below ---------------------------
top n of sample-specific bed
In [26]: bed = pd.read_csv(f'{analysis.data_dir}/{analysis.samples[1].name}/coverage/{an
...: alysis.samples[1].name}.peak_set_coverage.bed', sep='\t', header=None)
In [27]: bed.head()
Out[27]:
0 1 2 3
0 chr1 234880931 234882653 16
1 chr1 243268961 243270032 2
2 chr1 39845476 39846665 96
3 chr1 61865588 61866129 0
4 chr1 77594208 77594852 2
top n of matrix_raw in distributed=True vs. distributed = False or fast_and_unsafe = False
In [20]: df = analysis.measure_coverage(samples=s[0:2], assign=False, save=False)
In [21]: df.head()
Out[21]:
ATAC_KB_PID-1377_PBMC ATAC_KB_PID-1377_Tcell
chr1:9843-10840 18 26
chr1:15983-16492 3 1
chr1:180528-181802 18 18
chr1:191092-192113 8 5
chr1:267741-268289 8 4
In [22]: df = analysis.collect_coverage(samples=s[0:2], assign=False, save=False)
100%|█████████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.73it/s]
In [23]: df.head()
Out[23]:
variable ATAC_KB_PID-1377_PBMC ATAC_KB_PID-1377_Tcell
region
chr1:9843-10840 18 26
chr1:15983-16492 3 1
chr1:180528-181802 18 18
chr1:191092-192113 8 5
chr1:267741-268289 8 4
matrix_raw if distributed and fast_and_unsafe are True
In [24]: df = analysis.collect_coverage(samples=s[0:2], assign=False, save=False, fast_a
...: nd_unsafe=True)
100%|█████████████████████████████████████████████████████| 2/2 [00:00<00:00, 17.64it/s]
ngs_toolkit:atacseq:L857 (collect_coverage) [WARNING] > Using a concatenation method that is not 100% safe.
In [25]: df.head()
Out[25]:
ATAC_KB_PID-1377_PBMC ATAC_KB_PID-1377_Tcell
region
chr1:234880931-234882653 12 16
chr1:243268961-243270032 3 2
chr1:39845476-39846665 24 96
chr1:61865588-61866129 1 0
chr1:77594208-77594852 5 2