Four problems this week, though the first problem is very chill. The first two problems will deal with remote connections and transferring files back and forth between your local machine and a remote machine, whereas the last two problems will be working with SQL joins.
For the first two prbolems that work with the remote server, you should be connecting to the same server that you were emailed about and which you connected to in class. You should have (in theory) updated your password in class, so if you can’t remember now what you updated it to (and you didn’t get SSH keys set up), contact me and I’ll see about getting it reset.
For the last two problems, running the provided HW4.sql file in the repository will create a handful of new tables in a hw4 schema, which will show up to the left (separate from the public schema that your other tables have been appearing in). As we get to using more and more tables, I think placing them in a container like this will help you track them and keep things organized. The price you pay though is that when you refer to a table name not in the public schema, it must start with the schema name. For instance, the superhero’s table would be referred to as hw4.superheros. Given that you will frequently be aliasing the table names anyway to something shorter, this seems like a small price to pay for not having tables just scattered throughout your database. Let me know if this is causing you issues.
Problem 1: A Little Prompting
As discussed in class, many shells allow you to customize them through the editing of a simple text file. This holds true with BASH as well! If you log onto the remote server (the same one you accessed in the class activity), you may have noticed that there is already a .bashrc file sitting in your home directory (you’ll need to reveal hidden files using the ls -a flag to see it, since it starts with a dot). If you open this file in Nano (nano .bashrc), you’ll notice that there is already a lot of standard configuration placed in here, much of which is well documented with comments. Every single time a new BASH shell is opened, the contents of the .bashrc file are read and executed. So if you want to have certain aliases, environment variables, or other customization options always available whenever you open a shell, adding them to the .bashrc is how you do so!
One fun thing that can be edited in the .bashrc file is the actual prompt that you see before your cursor. This information is stored in the variable PS1, and you can actually see it set on lines 60 and 62. The exact syntax and options for configuring this can be messy (especially if you want fun colors), but thankfully folks have gone to the effort of building an online bash prompt generator here. Create a custom prompt of your choosing using the available building blocks in the generator. Clicking on an element once you’d added it will also let you adjust its color and other attributes. Create something that you like and then copy the necessary command to the end of the .bashrc document (this will just ensure it overrides anything defined earlier in the file). Include a copy of your chosen prompt in the provided Prob1.md template file and take a screenshot showing your prompt active in the remote connection to submit this. Please just upload the screenshot to the GitHub repository, don’t embed it in the markdown.
If you want to do the same on your local system (totally optional), you absolutely can! If you are using MacOS and thus are using ZSH instead of BASH, the relevant file is (unsurprisingly) .zshrc and there is an online prompt builder here.
Problem 2: Finding the Right Words
In other circumstances, servers might hold particular software or have greater resources available to them, and thus you want to run some program on the server. In many cases, you’ll have local data on your own system that you want to upload to the server, run a program, and then bring the results back to your local system. On the same remote server you accessed in class (and in the above problem), there is a program installed called word_finder which accepts a single filename as an argument, where that file should contain a square, compact block of letters that comprise a classic word search. You can look at the contents of puzzle_small.txt in the repository as an example. When run with a given text file, this program will do 3 things:
- output to standard output a table of all the words found within the puzzle and their corresponding definitions
- save to the same location as the text file an image of the word search
- save to the same location as the text file an image of the solution to the word search
Your task here is to generate and save the results (all three outputs, including the word list) for the two example word searches included in the repository. To achieve this task, you will need to upload files to the server, run the program, and then bring the output back to your local system so that you can upload the results back to GitHub (So 6 files to upload to GitHub!) in the Prob2Data folder. Given that you will need to be transferring several files, you could do them individually, but you could also use rsync to transfer all the contents in a particular folder (for example).
Problem 3: Superpowered Joins
Running the provided HW4.sql script in the repository will have created a selection of tables all related to superheroes. I am embedding an interactive view of the relationships between the tables below. A description of each of the tables is contained below in Table 1.
| Table name | Description |
|---|---|
superheroes |
The main list of superheroes. Includes their superhero name, full/real name, a list of IDs linking to other tables, and their height (in centimeters) and weight (in kilograms) |
sexes |
The sex of the superhero: Male, Female, or N/A |
colors |
Contains color options for eye color, skin color, and hair color |
races |
Contains all the different superhero races, such as Human |
publishers |
All the publishers that publish different superhero comics |
alignments |
Shows how a superhero is aligned: Good, Neutral, or Bad |
attributes |
List of possible attributes (such as intelligence) that a superhero can possess |
hero_attributes |
This table is an intersection of attributes and superheros. A superhero can have multiple attributes, and for each attribute, they can have a value from 0 to 100 indicating their rating. This table contains those ratings in the attribute_value column. |
superpowers |
All the available abilities or superpowers that someone can have. |
hero_powers |
Lists the pairings of superhero with possible powers, since a superhero can have more than one superpower. |
Use these tables to answer the following questions:
- What percentage of bad or evil superheroes have (purely) red eyes?
- What is the average intelligence of (purely) human superheroes?
- What is the most common superpower for superheros with blond hair?
- How many unique combinations of race and sex have no corresponding superheros?
Problem 4: Literal Relations
Running the provided HW4.sql script in the repository will have also created a family_tree table in your database (within the hw4 schema). This table includes information from several generations of simulated families, including information about marriages and children. A short description of the columns is below in Table 2.
| Column | Description |
|---|---|
pid |
A unique personal identification number of an individual |
name |
The given name of the individual |
spouse_id |
The pid of this individual’s spouse. To prevent duplication, this number is only assigned to the individual that marries into the family |
parent1_id |
The pid of this individual’s first parent. These are not provided for individuals marrying into the family |
parent2_id |
The pid of this individual’s second parent. These are not provided for individuals marying into the family |
yr_birth |
The year this individual was born |
yr_death |
The year this individual died. NULL if they are still alive |
yr_married |
The year this individual married. NULL if they never married |
sex |
M for male or F for female |
Use this table to answer the following questions:
- How many married couples are represented in the data?
- What names were duplicated across the family tree, where individuals have the same name despite being different people?
- What married partners have the greatest age difference between them? What is that age difference?
- What is the youngest age at which someone became a grandparent? Who became a grandparent at this age?