Last updated: 2025-04-09

Checks: 6 1

Knit directory: GradLog/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20201014) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version d380601. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory

Unstaged changes:
    Modified:   analysis/Log2024_new_beginning.Rmd
    Modified:   analysis/week_log.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/week_log.Rmd) and HTML (docs/week_log.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html ff0bca7 liliw-w 2024-04-24 Build site.
html 9fd6e5f liliw-w 2024-04-24 Build site.
Rmd 8173962 liliw-w 2024-04-24 wflow_publish("analysis/week_log.Rmd")
html 151bee4 liliw-w 2024-04-22 Build site.
Rmd b7a0bd9 liliw-w 2024-04-22 test toc render
html 423f69e liliw-w 2024-04-22 Build site.
html 1cf9a35 liliw-w 2024-04-22 Build site.
Rmd 2448521 liliw-w 2024-04-22 rename
html ae55f7a liliw-w 2024-04-22 test toc render
Rmd f595c33 liliw-w 2024-04-22 rename
html 76794f9 liliw-w 2024-02-20 Build site.
Rmd cd27671 liliw-w 2024-02-20 test
html 2b44f3b liliw-w 2024-02-20 Build site.
Rmd ab953a4 liliw-w 2024-02-20 test
html 79511af liliw-w 2024-02-20 Build site.
Rmd b4b7969 liliw-w 2024-02-20 test
html 355d984 liliw-w 2024-02-20 Build site.
Rmd d469ed5 liliw-w 2024-02-20 test
html 387ac23 liliw-w 2024-02-20 Build site.
Rmd 85299ae liliw-w 2024-02-20 test
html d8c2606 liliw-w 2024-02-20 Build site.
Rmd 26506ca liliw-w 2024-02-20 test
html c775adf liliw-w 2024-02-20 Build site.
Rmd f3d506d liliw-w 2024-02-20 test
html 8d7f0d8 liliw-w 2024-02-20 Build site.
Rmd 9b5c76e liliw-w 2024-02-20 test
html 46b8cba liliw-w 2024-02-20 Build site.
Rmd 04c5671 liliw-w 2024-02-20 test

April 09

Progress summary

  • Get familiar with and set up cluster.

  • Looked at the serial biopsies data.

GENOMIC_SPECIMEN.csv

Dataset Info

  • Each row is a sample. Multiple samples can come from one patient. These samples are profiled in different time points.

  • One patient’s samples can be tested by multiple TEST_TYPE (3%).

Filter samples and patients

  • Only consider patient specimens tested with OncoPanel from PROFILECOHORT, by filtering TEST_TYPE == 'ONCOPANEL_PROFILECOHORT'.

  • There are 27,148 specimen, 26,307 unique patients, 773 patients have their multiple samples tested.

  • 719 patients have been tested twice. The other patients were tested 3 to 6 times.

  • I only keep specimen samples from patients that were sampled twice, corresponding to 1438 sample left. [This corresponds to what we mentioned about the data - “~1000 patients”, “sequenced twice”]

Figure: Number of samples each patient has.

Patient classification

  • To classify a patient of two samples, I use flag IS_MATCHED_FLG (normal or tumor sample) and REPORT_DT (Date this report was signed out by the pathologist), which results in four categories.
patient_category n_patients
normal−normal 13
normal−tumor 1
tumor−normal 14
tumor−tumor 691

Figure: Number of patients belong to each catogory.

Questions -

  • No flag SAMPLE_COLLECTION_DT to record accurate date when sample was collected first.

  • Does “tumor-tumor” mean two samples were collected before and after treatment?

  • Why normal samples have non zero tumor purity? Some samples also have negative tumor purity.

Figure: Normal samples have non zero tumor purity.

TUMOR_PURITY change across patients’ two biopsies

  • TUMOR_PURITY - Estimated percentage of neoplastic cells in the sample.

Figure: TUMOR_PURITY changes across two biopsies of each patient belonging to four catogories.

GENOMIC_MUTATION_RESULTS.csv

Filter samples and variants

  • Use samples from above

  • Only SNP variants

  • Keep only patients that have two samples across mutation changes (some patients have only one sample with mutation info)

  • Resulting in 650 patients

Figure: Number of patients belong to each catogory.

ALLELE_FRACTION change across patients’ two biopsies for SNPs

  • ALLELE_FRACTION - Fraction of reads for the observed allele.

  • How many SNPs does each patient have?

Figure: Number of SNP varaints each patient has.

  • ALLELE_FRACTION change across patients’ two biopsies for SNPs

Figure: ALLELE_FRACTION change across patients’ two biopsies for SNPs.

Questions -

  • Use ALLELE_FRACTION directly or adjusted for TUMOR_PURITY by ALLELE_FRACTION/TUMOR_PURITY?

Misc

  • Set up Eris cluster

    • Can’t login after creating account? Wait 24 hours for the account to be activated.

    • Can’t submit job to slurm? Permission issue, contact help team.

  • Access to rstudio server and jupyter notebook

    • Username is lower case

    • Incorrect username or password? Contact help team to restart the server.

Things to double check -

  • Results put in /data/gusev/USER/llw?

  • Dropbox lab access?

  • MPG weekly meeting, location?

  • Download Doug’s data & upload to cluster


R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
 [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
 [9] ggplot2_3.4.3   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  xfun_0.39         bslib_0.5.0       colorspace_2.1-0 
 [5] vctrs_0.6.3       generics_0.1.3    htmltools_0.5.5   yaml_2.3.7       
 [9] utf8_1.2.3        rlang_1.1.1       jquerylib_0.1.4   later_1.3.1      
[13] pillar_1.9.0      glue_1.6.2        withr_2.5.1       lifecycle_1.0.3  
[17] munsell_0.5.0     gtable_0.3.4      workflowr_1.7.0   evaluate_0.21    
[21] knitr_1.43        tzdb_0.4.0        fastmap_1.1.1     httpuv_1.6.11    
[25] fansi_1.0.4       highr_0.10        Rcpp_1.0.11       promises_1.2.0.1 
[29] scales_1.2.1      cachem_1.0.8      jsonlite_1.8.7    fs_1.6.2         
[33] hms_1.1.3         digest_0.6.33     stringi_1.7.12    rprojroot_2.0.3  
[37] grid_4.2.3        cli_3.6.1         tools_4.2.3       magrittr_2.0.3   
[41] sass_0.4.6        whisker_0.4.1     pkgconfig_2.0.3   timechange_0.2.0 
[45] rmarkdown_2.23    rstudioapi_0.15.0 R6_2.5.1          git2r_0.32.0     
[49] compiler_4.2.3