We created a list of topics we wanted to learn this semester:
1. R
Working with our own data sets and stats issues as a group!! (no tutorials)
List of common statistical tests that we want to know.
“cstat” on campus speaker?
2. Bacterial genome assemblies
Siobhan [HPCC, checking quality, alignment to a reference on a laptop]
3. Qiime
Nico [amplicons; bacterial and fungal communities]
4. Python
Applications in python with relevant examples; searching, moving, isolating data using Python
iCER representative to help us? Mon, Thurs office hours (1-2pm)
5. GitHub!
15-20 min of refresher to get more people using this
They recently updated their user interface for uploading files instead of using command line
WEEKLY PLAN:
Google Drive document for each week that consists of:
- What we want to do
- What someone already knows/what they can offer
- Collaboration instead of a lesson plan
We still need input from the group about specific skills in Python we want to acquire! Next week (Feb 9), we will discuss basic statistical tests and each group member will bring tests they know how to perform to share with the group. If you want to participate in the meeting and need to be added to the shared Google Drive folder or have Python input, please contact Katie Wozniak at woznia54@msu.edu
Feb 9:
We discussed statistical tests we frequently do and what we would still like to learn. For tests we use often, we ran code on actual data sets in R to understand format of code and results of test. Still on our list to do in R: non-parametric tests, and determining homogeneity of variance before running ANOVAs. More information about the meeting and a summary can be found in the shared drive (called “Feb9Mtg: Stat Tests”).
Feb 16:
This week, we dove back into GitHub with a short tutorial (~15 minutes long). Notes on basic commands for moving files around, creating and updating branches, and updating the master can be found in the shared Google drive document called “Feb16: GitHub.” Specifically, we learned:
- To make changes to a document, you need a text editor (saving here would save locally to your computer).
- To save the changes made locally, you need Git Shell to type commands similar to those in the notes.
- One person has “master” rights to documents, meaning they can accept or deny any changes made in separate branches.
- For example: Siobhan was the master, Katie had branch A and Brian had branch B. Katie wanted to make changes to the document and Brian wanted to delete the document. Siobhan would review changes made on both branches and choose to keep branch A changes. By accepting Katie’s changes, Siobhan updated the master to be the same as what changes were made in branch A.
Apr 5:
Today we discussed processing ITS sequences using RDP (rdp.cme.msu.edu) here at MSU. For those of you who don’t know: Internal transcribed spacer (ITS) refers to the spacer DNA situated between the small-subunit ribosomal RNA (rRNA) and large-subunit rRNA genes in the chromosome or the corresponding transcribed region in the polycistronic rRNA precursor transcript. In bacteria and archaea, ITS is located between the 16S and 23S rRNA genes. On the other hand, there are two ITS’s in eukaryotes; ITS1 is located between 18S and 5.8S rRNA genes, while ITS2 is between 5.8S and 25S (in plants, or 28S in animals) rRNA genes. ITS1 corresponds to the ITS in bacteria and archaea, while ITS2 originated as an insertion that interrupted the ancestral 23S rRNA gene (Lafontane et al.)
Some of the members have had problems using RDP to identify taxonomy of sequences. Dr. Jim Cole walked us through how to use the RDP website to de-bug the problem.
Samples from: Fungi, plant, soil (either ITS 1 or 2)
Problem: RDP classifier issues through HPCC
- The one bundled with Qiime will not work
- Run through the website instead (rdp.cme.msu.edu) for small sets
- Can ask questions, call, or email helpers in the RDP
Solution: Use the RDP website
- “Cl” for classifier
- Cut and paste sequences from fasta
- Select Fungal Warcup (or Unite)
- Run and see hits
- Genera–>show assignment detail for % of matches
- Each lineage= boot-strap confidence value (not a p value)
- Can also do this in the command line (they have instructions online)
*The RDP website has tutorials!*
Looks like you’re off to a great start!